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ABSTRACT Advances in genetics have made it feasible to genetically engineer insect strains 
carrying a conditional lethal trait on multiple loci. We model the release into atarget pest population 
of insects carrying a dominant and fully penetrant conditional lethal trait on 1-20 loci. Delaying the 
lethality for several generations after release allows the trait to become widely spread in the target 
population before being activated. To determine effectiveness and optimal strategies for such 
releases, we vary release size, number of generations until the conditional lethality, nonconditional 
fitness cost resulting from gene insertions, and fitness reduction associated with laboratory rearing. 
We show that conditional lethal releases are potentially orders of magnitude more effective than 
sterile male releases of equal size, and that far smaller release sizes may be required for this approach 
than necessary with sterile males. For example, a release of male insects carrying a conditional lethal 
allele that is activated in the F 4 generation .on 10 loci reduces the target population to 10T 4 of 
no-release size if there are initially two released males for every wild male. We show how the 
effectiveness of conditional lethal releases decreases as the nonconditional fitness reduction (i.e., 
fitness reduction before the trait becomes lethal) associated with the conditional lethal genes 
increases. For example, if there is a 556 nonconditional fitness cost per conditional lethal allele, then 
a 2:1 (released maleiwild male) release with conditional lethal alleles that are activated in the F 4 
generation reduces the population to 2-5% (depending on the degree of density dependence) of the 
no-release size. If there is a per-allele reduction in fitness, then as the number of loci is increased 
there is a trade-off between the fraction of offspring carrying at least one conditional lethal allele 
and the fitness of the released insects. We calculate the optimal number of loci on which to insert 
the conditional lethal gene given various conditions. In addition, we show how laboratory-rearing 
fitness costs, density-dependence, and all-male versus male-female releases affect the efficiency of 
conditional lethal releases. 

KEY WORDS conditional lethal, genetic control, multilocus, sterile male, model 



The concept of mass releasing genetically altered in- 
sects for purposes of pest control dates back more than 
50 yr. Serebrovsky (1940) proposed the use of insects 
carrying chromosomal translocations, and Knipling 
(1955) proposed the sterile insect technique (SU) for 
the control of insect pests. Although translocations 
have not been widely used in pest control SIT has 
been applied extensively against a variety of insects. 
Although it has had many successes (Klassen et al. 
1994), experience has shown that it works only when 
either there are resources available for a major wide- 
area effort (e.g., screwworm fly in the United States 
and Mexico) or it is applied to ecologically isolated 
areas. In either case, the biotic potential of the target 
population cannot be too high. 

To achieve eradication, a high ratio of sterile to 
normal males must be achieved for multiple genera- 
tions over an area large enough for migration to be 
low. To achieve high sterility in the target population, 
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the ratio of sterilized to fertile insects must be high 
enough that a fertile insect has a low probability of 
mating with another fertile insect. If (1) the sterile 
insects in an all-male release have mating fitness equal 
to the wild males, (2) releases are done in such a way 
that the sterile insects are perfectly mixed with the 
wild ones, and (3) females mate onlv once, then the 
reduction in population in the next generation is equal 
to the percentage of sterile males in the male popu- 
lation. Unfortunately, the processes of sterilization 
and laboratory rearing result in individuals with fitness 
reduced up to 80% (e.g., Holbrook and Fujimoto 1970, 
Hooper and Katiyar 1971, Ohinata et al. 1971). Fitness 
in the field may be even worse (e.g.. Shelly etal. 1994). 
Because of the low fitness of sterilized insects, verv 
large ratios (on the order of 10:1-1,000:1) of sterile to 
wild insects are required to achieve a high level of 
sterility in the field. Because the sterilized insects are 
all dead after one generation and don't leave any 
descendants, releases must be continued regularly 
throughout the season to keep the ratio high. 

In response to these problems, a number of alter- 
nate genetic control mechanisms have been proposed 
(see Whitten 1985), including natural sterility (e.g., 
hybrid sterility and cytoplasmic incompatibility). 
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translocations, meiotic drive and positive heterosis (to 
drive deleterious genes into the target population), 
and conditional lethal traits. Although these ideas 
have been around since the 1950s and 1960s, none 
have been implemented on a large scale, largely be- 
cause of the difficulties in breeding and raising insects 
with the required characteristics. As a result, interest 
in these techniques declined in the 1980s and early 
1990s. 

The conditional lethal release method was first pro- 
posed by LaChance and Knipling (1962). In a condi- 
tional lethal release, insects are released into the field 
carrying a trait that is lethal only under restrictive 
conditions. If the trait does not become lethal imme- 
diately, then the trait can spread into the wild popu- 
lation through inter-mating between released and 
wild insects. Temperature-dependent lethal traits or 
traits causing a failure to diapause would be suitable. 

Theoretical work on the use of conditional lethal 
traits for pest control has been relatively sparse. Klas- 
sen et aL (1970a) and Klassen et al. (1970b) modeled 
releases of insects carrying conditional lethal traits 
controlled by up to four genes with additive effects. 
They showed that such releases could achieve greater 
reduction in pest population numbers than sterile 
male releases. More recently, Kerremans and Franz 
(1995) modeled the release of females carrying a re- 
cessive temperature-sensitive lethal mutation and a 
Y-autosome translocation. They found that the use of 
a single conditional lethal gefne would be ineffective, 
but that a single conditional lethal could be useful in 
combination with other techniques. 

Primarily because of the lack of suitable genes, con- 
ditional lethal have not been used successfully for pest 
control The production of dominant conditional le- 
thal mutants by traditional genetic methods appears 
prohibitively difficult (Fryxell and Miller 1995). 

Recent success in genetically transforming insects 
(Handler et aL 1998, Coates et al. 1998, Jasinskiene et 
al. 1998) raises the hope that the stable genetic trans- 
formation of insects may become a routine undertak- 
ing in the near future (Atkinson and CBrochta 1999 
review the progress of research in this area) . Refine- 
ment of such genetic transformation techniques 
should allow the creation of insect strains carrying 
dominant conditional lethal traits. Conditional lethal 
genes known in, for example, D. melanogaster could be 
used with other species. Fryxell and Miller (1995) 
discussed the properties required of a conditional le- 
thal gene suitable for general use as pest control agent 
and demonstrated the existence of one such gene in D. 
melanogaster. 

Given these advances, it is appropriate to reexamine 
the use of conditional lethal traits for insect pest con- 
trol. Conditional lethal releases with genetically trans- 
formed insects have (at least) three important differ- 
ences from the conditional lethal releases envisioned 
by Klassen and coworkers. First it should be possible 
to insert a much larger number of conditional lethal 
loci than could be attained through classical tech- 
niques. Second, nonconditional (i.e.. constitutive) ge- 
netic load associated with the insertions is likely to be 



an important component of the dynamics of releases 
with conditional lethals on multiple loci. Third, Klas- 
sen focmed primarily on cases in which the condi- 
tional lethal alleles are not fully penetrant. With ge- 
netic transformation techniques it should be possible 
to use conditional lethal alleles which can cause close 
to 100% mortality with a single copy. We use a genetic 
model to determine the optimal number of copies of 
a conditional lethal allele to engineer into a release 
strain for varied genetic and ecological conditions. 

Specific Questions Addressed. In this article we as- 
sume that there is one dominant conditional lethal 
allele (i.e., coding DNA sequence) that can be in- 
serted into multiple sites of an insect's genome. We 
assume that it would be feasible to insert copies of the 
allele onto between one and 20 loci that are not phys- 
ically linked. We address the following specific ques- 
tions: 

Ideally, How Effective Can Multilocus Conditional 
Lethal Releases Be? How would the effectiveness of an 
ideal (i.e., released insects have no reduction in fit- 
ness) multilocus conditional lethal release compare 
with an ideal sterile male release? For a given size of 
release, what is the greatest reduction in the pest 
population that could be achieved? 

What are the effects of a Fitness Cost Due To the 
Released Alleles? It is unlikely that a large number of 
gene copies can be inserted into the genome of an 
insect without doing rion conditional genetic damage 
(e.g., insertions within coding regions). Given this, 
how much nonconditional genetic load can the re- 
leased insects carry and still be effective in spreading 
the conditional lethal gene into the pest population? 

What is the Optimal Number of Loci at Which To 
Insert the Conditional Lethal Gene? The probability 
that the descendants of matings between released- 
and wild-type individuals will pass on at least one copy 
of the conditional lethal allele to their offspring is 
increased by each additional locus on which the con- 
ditional lethal allele is inserted. Obviously then, if the 
released alleles carry no fitness cost, then the eventual 
reduction in the pest population is greater for each 
additional locus. However, if each additional condi- 
tional lethal allele does carry a constituitively ex- 
pressed fitness cost, then there should be an optimal 
locus number balancing the fitness reduction of the 
released insects and their offspring with the increase 
in fraction of offspring (jurying the conditional lethal 
allele 

How Much Difference in Effectiveness is there Be- 
tween an All-Male Release and a Male-Female Release? 
How much is the pest potential of the population 
increased by release of females? A drawback to mass- 
release schemes for pest control is that the release can 
increase the number of pests in the field in the short 
run. In many cases, this is not economically accept- 
able. One way to overcome this is through male-only 
releases. Many agricultural pest species are pestiferous 
only in their larval stage. If only adult males are re- 
leased, then there will be no increase in the number 
of larvae and no contribution to population growth. 
Unfortunately, separating insects by gender can be 
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• es difficult Furthermore, releasing the females along loci requires 1,024 gamete types and 59,049 genoty pes. 

. us _ with the males increases the size of the release and Working with 20 loci requires over 1 million gamete 

C K_ therefore increases the reduction in the pest popula- types and over 3 billion genotypes. With the rapid 

, e _ Hon when the conditional lethal allele does become advance of gene technology, a release involving 10-20 

,| e lethal. It would be useful to know how much is gained loci is conceivable. The multilocus problem is simple 

)se in long-term reduction and lost in short-term increase if the loci are in equilibrium. In this case the loci are 

•ti c in pests when the females are released along with in random association and gamete frequencies are 

0 f males. simply the product of gene frequencies across loci, 

ue How Much Do Decreases in Field Fitness Resulting However, we are interested in the period immediately 

j. from Laboratory Rearing Decrease the Effectiveness of after the introduction of a population of one genotype 

as- This Technique? Insects raised in the laboratory often into a population of another genotype. In this situa- 

h a j have a reduced fitness in field conditions. This can tion, there are correlations (henceforth called gametic 

m . result from maternal effects, inbreeding, drift, or se- disequilibrium) between allelic states at different loci. 

v Ve lection by laboratory conditions (see, for example Initially, all individuals carrying the conditional lethal 

the Hopper et al. 1993 or Mackauer 1976) . This laboratory- allele on one locus have it on all loci, and this associ- 

vs _ rearing load has been a major impediment to mass- ation breaks down only gradually. Thus, we must track 

es _ release schemes. Laboratory-rearing effects with age- all genotypes as they change over time. 

netic basis have stronger negative impacts in a Fortunately, the problem can be simplified. The 

na l conditional lethal release than a sterile male release same trait is being released at the same frequency on 

an because there are more generations of field selection every locus. If selection acts the same on all loci., then 

fit- against deleterious laboratory traits. It would be useful the frequencies of all genotypes with the same number 

are to know how much fitness reduction from specific of released alleles will be equal (regardless of which 

> of laboratory-rearing traits can be sustained before con- loci the alleles are on). In this case, we only have to 

>e S t ditional lethal releases become ineffective. track one variable for each possible number of re- 
leased alleles per gamete: L variables instead of 3 L . The 

the . % M th d probabilistic calculations required to do this are in- 

r of - e o volved, but they yield a fast and easily programmable 

" an General Model Parameters and Terminology. The algorithm (see Appendix I) . If we wish to break up the 

age model simulates the release of genetically engineered loci into multiple fitness types, then the number of 

his, j insects of a diploid species into a wild population. The variables is (Lj+1) (I^+l) (L 3 H-1). . . j; where L ; is the 

re - time scale of the model is generations. The model number of loci with fitness type 1, and so on, 

ling parameters and output variables are as given below: As inter-mating between released and wild popu- 

on? Parameters. L: The number of loci on which the lations occurs, the gametic disequilibrium breaks 

To conditional lethal trait is inserted. C. The noncondi- , down. The gamete frequencies approach the products 

ility tional fitness reduction per inserted conditional lethal of the individual gene frequencies. In the absence of 

;ed- allele. This is assumed to be caused by random damage selection against the released trait, the gamete fre- 

opy to the genome caused by the insertion of the alleles, quency of the no-conditional lethal gamete converges 

g is I: The fraction of the target population that are re- to (l-p) L . where p is the initial frequency of the cbn- 

•on- leased insects immediately after the release. ditional lethal allele. The no-conditional lethal geno- 

the Output Variables. (1 ) The frequency of the geno- type frequency converges to the square of the gamete 

tual type with no conditional lethal alleles. Insects with this frequency: {i-p) 2L . 

ach genotype will be the only survivors when the condi- Nonconditional Genetic Load. The no-conditional 
ndi- tional lethal allele is activated and becomes lethal (2) lethal genotype insects (insects with no copies of the 
ex- The number of insects with the no-conditional lethal conditional lethal allele) are assumed to have maxi- 
mal genotype relative to the number in a population where mum fitness. Other genotypes have a fitness reduction 
the there is no release, assuming no density dependence determined by the number of conditional lethal al- 
*ase in mortality. This is the number (as opposed to fre- leles. This fitness reduction is nonconditional: that is, 
thai quency) of survivors when the conditional lethal allele it reduces the fitness of the insects carrying it under all 
becomes lethal. (3) The population size relative to environmental conditions, and therefore reduces the 
Be- that with no release, assuming no density dependence penetration of the conditional lethal allele into the 
ase? in mortality. This is calculated as the running product target population. We normally assume that this fit- 
tion (over generations) of the average fitnesses. See below ness cost is the same on all loci. However, we also work 
iass- for more detail. (4) The gametic disequilibrium of the with two classes of loci with distinct fitness costs to 
can no-conditional lethal gamete type. This is defined as explore the sensitivity of results to the assumption of 
hort the difference between the no-conditional lethal ga- equal fitness costs. 

ept- mete frequency and what it would be with no statis- Mackay et aL (1992) studied the effects of P-ele- 

only tical association between loci. ment insertions on viability of Drosophila melano- 

rous Components of the Model. Calculating Gametic and gaster. They found that on average each insertion 

• re- Genotype Frequencies. In general, multilocus problems decreased viability of insects by 5.5% for heterozy- 

tiber are difficult, because of the large number of genetic gotes and 12.2% for homo zygotes (however, some 

wth, states possible with multiple loci (2 L possible gamete insertions appeared to have no effect on viability), 

j be types and 3'" possible genotypes). Working with 10 Regression of viability on the number of insertions 
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yielded an expression with significant linear and qua- 
dratic terms. The resulting quadratic expression is 
reasonably approximated by a multiplicative expres- 
sion of the form w(y) - (l-cosf)". where w(tj) is the 
fitness of the genotype, cost is the reduction in fitness 
per inserted allele, and y is the number of conditional 
lethal alleles in the genotype. Decreases in viability 
likely underestimate the total decrease in fitness. 
However, Mackay et al. were working with random 
insertions events; these insertions can be screened to 
find the ones that cause the least fitness damage. We 
use values of 2.5, 5, and 10% for cost This expression 
applies until the conditional lethal allele becomes le- 
thal, at which point the fitness of all individuals car- 
rying any copies of the conditional lethal allele is zero. 

When there are two types of loci, then marginal 
fitness is given 

where yl is the number of alleles with fitness cost costl 
and i/2 is the number of alleles with fitness cost cost2. 
See Appendix 1 for more details on how selection is 
applied. 

We assume throughout this article that mating is 
random and that there is no sexual selection. Thus, we 
cannot take into account (for example) preferences 
for mating between like types (e.g., wild with wild and 
released with released) or preferences of females of all 
genotypes for wild-type males. Such preferences can 
be crucial in sterile male releases. Thus, this assump- 
tion may be a major weakness in the model. 

Population Dynamics. Numbers of surviving insects 
is the quantity of interest for pest control. Thus, ge- 
notype frequencies alone are not sufficient informal 
tion. Although realistically modeling population dy- 
namics is beyond the scope of this article, we can 
examine the two theoretical extremes of population 
regulation. 

If mortality is density-independent, then the cumu- 
lative product of the population average fitnesses in 
each generation gives a measure of population size 
relative to what it would have been with no release. 
For example, assume a discrete population growth 
model with no density dependence and population 
growth rate w avfi (t)R t , where ic^t) is the average 
fitness at time relative to the fitness of a wild-type 
individual and R, is the population growth rate at time 
t for a wild-type population. Then, the population at 
time t + 1 is 

N, T| = tb t R t N< 

and 

$J g = jR f _ \ ic f-oRf-z . . . RpNy 

in 

where r is the population size in generation 0 and S, 
is the population size in generation t If there is no 
release, then u?„ VfS (t) = 1 for all t If there is a release, 
then w tnK (t) is less than 1 and the cumulative product 
of average fitnesses gives population size relative to 
that with no release. If this relative population size is 
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multiplied with no-conditional lethal genotype fre- 
quency, then we have the number of no-conditional 
lethal insects (after a release) relative to the total 
number of insects with no release. This quantity will 
be referred to as the "relative number" of no-condi- 
tional lethal insects. It should be kept in mind that this 
quantity is meaningful only in the case of density 
independence. 

The other extreme in population regulation occurs 
when density-dependence is so strong that the pop- 
ulation is always at carrying capacity. In this case the 
no-conditional lethal genotype frequency alone is a 
measure of no-conditional lethal numbers relative to 
no release. Obviously, no real population behaves this 
way (and conditional lethal releases would not be 
useful against it if it did) , but by looking at this extreme 
we can get a sense of the importance of population 
dynamics. Output of the population state will always 
show genotype frequencies and genotype numbers 
relative to no release. 

Pest Damage. In the context of asking whether fe- 
males should be included in conditional lethal re- 
leases, we iu*e interested in how much additional pest 
damage will be caused by the release. This is linked 
with how the population s density is regulated. Again, 
we can only look at the extreme cases. If the popula- 
tion is always at carrying capacity, then population 
numbers remain constant and the release has no effect 
on population size and therefore no effect on pest 
damage. If mortality is density-independent, then the 
cumulative average fitness of the population is the 
population size relative to no release and is therefore 
related to pest damage. The average fitness of a gen- 
eration is calculated as the average reproduction rel- 
ative to that of a population with only wild-type in- 
sects. The relative number of eggs laid at the beginning 
of a generation is given by the cumulative average 
fitness through the previous generation, and the rel- 
ative number of adults who reproduce in a generation 
is given by the cumulative average fitness through that 
generation. The pest damage caused by a genera- 
tion should be proportional to a value between the 
previous generation's cumulative average fitness and 
the current generation's cumulative average fitness. 
If females are released, then cumulative average 
fitness becomes u? awc (M; v^t-l) w avK (t-3)-- 
tf flW (0) (1+/), where /is the number females released 
relative to the number of wild females present before 
the release. 

Laboratory-Rearing Costs. Fitness reduction of lab- 
oratory-reared insects can be grouped into maternal 
effects and genetic effects. 

Maternal effects include malnutrition and diseases. 
Most maternal effects (excepting disease) will not 
extend beyond the release generation. Such effects are 
equivalent to a reduction in the size of the released 
population, and can be modeled as such. Diseases that 
can be transmitted vertically between generations are 
an exception to this. Such diseases may often be the 
primary agent for fitness reduction in laboratory pop- 
ulations (Hopper et al. 1993), but are beyond the 
scope of this article. 
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Fig. 1. Mating scheme for the all-male release. J is the fraction that the released males make up of the total male population. 
\V ttf U:t is the fitness of the all-released genotype, W tktMCL is the fitness of the no-conditional lethal genotype, and W Hra is the 
average fitness of the population. 



Deleterious genetic changes in the laboratory can 
result from drift, inbreeding, or selection arising from 
laboratory conditions. Although deterioration in fit- 
ness of laboratory colonies has often been observed 
(e.g., Bigler et al. 1982 ? van Bergeijk et al. 1989, Geden 
et al. 1992) , the genetic basis of such change has rarely 
been verified, let alone explored in detail (Hopper et 
al. 1993). We will, however, assume that there is ge- 
netically based reduction in fitness resulting from lab- 
oratory rearing. Because most deleterious mutations 
are recessive (Lynch and Walsh 1998), we will model 
deleterious genetic effects resulting from laboratory 
rearing as recessive traits. 

Deleterious recessive traits are modeled as a group 
of /identical loci, such that an insect homozygous for 
the laboratory trait on all / loci experiences maximum 
fitness reduction. The relative fitness of a genotype 
with z loci homozygous for the laboratory trait and y 
copies of the conditional lethal allele is w(x,y) = 
(l-cost) u (l-labcosi)' where cost is the fitness reduc- 
tion from each conditional lethal allele and labcost is 
the fitness reduction for each homozygous recessive 
locus. 

It can be shown (Schliekelman 2000) that the num- 
ber of loci contributing to a fixed total fitness reduc- 
tion on the laboratory trait loci does not affect the 
frequency of insects that are no-conditional lethal on 
the conditional lethal loci. For simplicity, we use two 
loci for the laboratory trait and 50 and 20% (in separate 
model runs) as the relative fitnesses of individuals 
homozygous for the laboratory allele on both loci 

Release of Insects. Two types of insect releases are 
simulated: all-male and male-female. The size of re- 
leases is expressed as the ratio of number released to 
total (both genders) number of wild insects. Examples 
of both types of releases are given in Appendix 2. 



In the all-male release there will be two types of 
matings in the release generation: no-conditional le- 
thal genotype males with no-condttional lethal fe- 
males and all-conditional lethal genotype males with 
no-conditional lethal females. See Fig. 1. If I is the 
fraction of males with the released genotype, then the 
proportion of matings involving a male of all-condi- 
tional lethal type will be the product of / and the 
fitness of the all-conditional lethal genotype. The off- 
spring from these matings will be hemizygous for the 
conditional lethal allele on each locus (the term hemi- 
zygous is used instead of heterozygous to indicate that 
there is no alternate allele to the conditional lethal 
allele). Likewise, the proportion of matings involving 
no-conditional lethal males will be (1-/) multiplied by 
the fitness of the no-conditional lethal genotype. The 
offspring from these matings will carry no copies of the 
conditional lethal allele. In the second generation and 
beyond, mating is random and proceeds according to 
the algorithm described in Appendix I. 

We assume that the released insects experience the 
full fitness cost due to the conditional lethal allele. 
Thus, the results will be a conservative estimate of 
efFectiveness of the conditional lethal release method, 
because some of the fitness reduction is likely to occur 
in immature life stages. 

In this article, we use an all-male release as the 
standard. We show (see Results section) that the dif- 
ference between all-male and male- female releases is 
small if the size of the release is the same (that is. the 
number of males in the all-male release equals the 
number of males and females in the male- female re- 
lease). Thus, all results can be applied to either type 
of release. 

In the male-female release, there is random mating 
in the release generation. The initial cumulative av- 
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Fig. 2. 2:1 all-male release with no fitness cost for the released-type allele, (a) Frequency of the no-conditional lethal 
genotype shown on a log scale, (b) Gametic disequilibrium of the no-conditional lethal gamete t\pe. This is calculated as 
D = lV 4 u-P L > where N A \v « the frequency of the rio-conditional lethal gamete and p is the no-conditional lethal allele 
frequency. See text for explanation of the shape. 



erage fitness is adjusted to account for the increase in 
matings resulting from the release of females. 

Sterile male and other genetic control releases have 
generally been very large. Typically, releases have 
been at a frequency of once per week or more (e.g., 
Davidson [1974]), with released-to-wild ratios of 
100:1 not uncommon (e.g., Davidson [1974} ). A con- 
ditional lethal release should require much smaller 
release sizes at less frequent intervals. We will simu- 
late all-male release of sizes 1:6, 1:2, 2:1. and 19:2. 
which are chosen as a very small, small, medium, and 
large releases. 

Results 

"Ideal" Release. By ideal release we mean one in 
which the released insects have fitness equal to the 
wild type. This gives a bound on how much the 
method can reduce target population size, and is use- 
ful for comparisons with other genetic control tech- 
niques. 

Figure 2a shows the frequency (on a log scale) of 
the no-conditional lethal genotype over time (gener- 
ations) when an allele with no fitness cost is released 



on 2, 6, 10, 14, or 18 loci at a 2:1 ratio of released to 
native individuals. Only the no-conditional lethal ge- 
notype will survive when the conditional lethal allele 
becomes lethal. 

The no-conditional lethal frequency drops rapidly 
with generation as the gametic disequilibrium breaks 
down and alleles from the released strain become 
dispersed throughout the wild population. As the dis- 
equilibrium drops, the no-conditional lethal frequen- 
cies approach their equilibrium values of 0.6 2 L (where 
0.6 is the frequency of the conditional lethal allele on 
each individual locus. See Appendix 2). Table 1 shows 
the no-conditional lethal genotype frequencies at two, 
four, and six generations as a function of L By the 
fourth generation, the frequency of the no-conditional 
lethal genotype drops to under 4.5 X 10~ 4 for L = 10 
and under 2.4 X 10~ 5 for L = 18. Table 1 also shows 
the same information for 1:6 and 19:2 all-male releases. 
Even small releases can be effective for large L. With 
a release as small as one released for every six in the 
wild population, the wild population can potentially 
be reduced to 12% of its original size if the conditional 
lethal allele is activated in the F 4 generation and 2% of 
its original size if the conditional lethal allele is acti- 
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vated in the F 6 generation. The large (19:2) release 
reduces the no-conditional lethal population to fre- 
quencies of 10~ 7 to 10" 10 in later generations. 

The gametic disequilibrium of the no-conditional 
lethal type (the difference between the no-condi- 
tional lethal gamete frequency and that if loci were in 
equilibrium) is shown in Fig. 2b). The initial sharp 
increase is the result of a peculiarity* of all-male re- 
leases explained in Appendix 2. The gametic disequi- 
librium drops sharply in the F 2 generation, but remains 
substantial for three to four generations. This indicates 
that a nonequilibrium model is necessary. 

Effect of Nonconditional Fitness Cost of the Re- 
leased Allele. Five Percent Fitness Cost per Confidence 
Limits Allele. Fig. 3 shows the no-conditional lethal 
genotype frequency (Fig. 3a) and relative number 
(Fig. 3b) in an 2:1 all-male introduction in which there 
is a 5% fitness cost per conditional lethal allele. In the 
release generation, all individuals either have no con- 
ditional lethal alleles or they have conditional lethal 
alleles in homozygous form at all loci. This second 
group of individuals have the minimum fitness. Ac- 
cordingly, there is strong selection against the condi- 
tional lethal alleles in the release generation and the 
no-conditional lethal frequency increases. In the F, 
generation all individuals have either no conditional 
lethal alleles or they have conditional lethal alleles in 
hemizygous form at ail loci (see Fig. 1). Thus, selec- 
tion against the conditional lethal allele is weaker and 
the no-conditional lethal type drops in frequency in 
subsequent generations (except for L = 16-20) as 
inter-mating occurs between the native and released 
population and the gametic disequilibrium breaks 
down (and the number of individuals with many or no 
conditional lethal alleles decreases). However, the 
rate of disequilibrium breakdown decreases over time 
and selection is continuously acting in favor of the 
wild-type genes. Selection again becomes stronger 
than the impact of decreasing gametic disequilibrium 
after several generations and the no-conditional lethal 
genotype frequency begins increasing. Thus, there are 
three phases to the dynamics: phase 1, when the con- 
ditional lethal alleles are in high association, selection 
is dominant, and the frequency of no-conditional le- 
thal genotype is increasing; phase 2, when the impact 



of the breakdown of gametic disequilibrium is stron- 
ger than selection and the no-conditional lethal ge- 
notype frequency decreases; and phase 3, when ga- 
metic disequilibrium has dissipated and selection 
against the conditional lethal allele becomes dominant 
again. All phases do not exist in all cases. The relative 
impact of selection and breakdown of gametic equi- 
librium (and thus the relative impact of the three 
phases) vary as L increases. For larger values of L there 
is very strong selection against the conditional lethal 
allele in the first few generations because all condi- 
tional lethal alleles are in association with many other 
conditional lethal alleles. With L = 20, for example, 
selection remains dominant through the F 2 generation 
and the no-conditional lethal genotype frequency in- 
creases rapidly. It is only when the conditional lethal 
alleles have become dispersed in the population (and 
thus the number of individuals with many conditional 
lethal alleles is low) that the effect of disequilibrium 
breakdown temporarily overpowers selection. Thus, 
the impact of phase 1 is great, whereas the impact of 
phase 2 is small. However, there is no phase 1 with L = 

2. With few loci, selection against the conditional 
lethal allele is not strong enough to overcome the 
impact of the breakdown of disequilibrium until phase 

3. This makes it clear why using more loci in a release 
is not always better. 

Figure 3d (gametic disequilibrium of the no-con- 
ditional lethal gamete type) helps clarify this. Hie 
immediate strong selection against the conditional le- 
thal allele in the high L case slows the rate of break- 
down of gametic disequilibrium relative to lower L. 
Thus, lower L is favored if conditional lethal becomes 
lethal early. The efficacy of the release is sensitive to 
the generation in which conditional lethal becomes 
lethal If the number of generations is small, then 
gametic disequilibrium (and therefore, no-condi- 
tional lethal frequency) is still high. If the number of 
generations is too large, then selection in favor of the 
wild-type allele has driven no-conditional lethal fre- 
quency back-up. 

Figure 3c shows the population size relative to no 
release in the case of density-independent mortality. 
We see that nonconditional senetic load causes sub- 
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Fig. 3. 2:1 all-male release with a 5% cost per released-type allele in the genotype, (a) Frequency of no-conditional lethal 
genotype. In the case of population always at carrying capacity, this is the number of no-conditional lethal relative to that 
with no release, (b) Relative number of no-conditional lethal genotype. (No-conditional lethal genotype frequency multiplied 
by cumulative average fitness.) In the case of density-independent mortality, this is the number of no-conditional lethal 
genotype insects relative to no release, (c) Population size relative to no release when mortalitv is density independent, (d) 
Gametic disequilibrium of the no-conditional lethal gamete type. The curves go consecutively from L - 2 to L - 20 in the 
order shown. 



stantial population suppression even before the con- 
ditional lethal alleles become lethal 

Note that the no-conditional lethal relative number 
reaches an asymptote as the gametic disequilibrium 
goes to zero. It can be shown that this occurs as a result 
of the fitness scheme used here (specifically, it re- 
quires that the fitness of a conditional lethal homozy- 
gote on one locus is equal to the square of the fitness 
of the hemizygote). Thus, the no-conditional lethal 
subpopulation grows at a rate that is independent of 
the genetic structure of the remainder of the popu- 
lation when gametic disequilibrium is zero. Note that 
the no-conditional lethal relative number will reach an 
asymptote with any fitness function as the average 
fitness goes to one and the conditional lethal alleles are 
removed from the population by selection (as op- 
posed to the case here where it reaches an asymptote 
as the gametic disequilibrium dissipates). 

The optimal number of loci to be used in a condi- 
tional lethal release can be determined from Fig. 3 a 
and b. In the case of a population always at carrying 
capacity, the optimal locus number is the value of L 



which minimizes no-conditional lethal genotype fre- 
quency in the generation that the released trait is 
expected to become lethal In the case of density 
independent mortality, the optimal locus number is 
the L that minimizes the relative number of no-con- 
ditional lethal genotype individuals in the generation 
that the conditional lethal allele is activated. Optimal 
L depends strongly on the generation that the condi- 
tional lethal allele will become lethal— in the constant 
population size case the optimal L increases from 4 if 
the lethality occurs in the second generation to 6 if 
conditional lethal is lethal in the fourth generation and 
to 8 if the lethality occurs in the sixth generation. In 
the density-independent case, the optimal values are 
4, 8, and 8, respectively. Note that these values are only 
accurate to ±1, because only even values of L were 
checked. 

At the optimal L, the no-conditional lethal numbers 
relative to populations with no release are significantly 
greater than in the ideal case, but still low: 0.05 in 
genotype frequency and 0.019 in relative numbers by 
the F 4 generation, and 0.04 and 0.0096, respectively, in 
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Fig. 4. 2:1 all-male release with a 10% cost per released-type allele in the genotype, (a) Frequency of no-conditional lethal 
genotype. In the case of population always at carrying capacity, this is the number of no-conditional lethal relative to that 
with no release, (b) Relative number of no-conditional lethal genotype. (No-conditional lethal genotype frequency multiplied 
by cumulative average, fitness.) In the case of density-independent mortality, this is the number of no-conditional lethal 
genotype insects relative to no release. 



the F 6 generation (with standard model conditions 
and a release size of 2:1). 

Ten Percent Rtness Cost per Released Allele. Fig. 4 a 
and b shows the no-conditional lethal genotype fre- 
quency and relative number for a 2:1 all-male release 
with a 10% fitness cost per conditional lethal allele. 
The effect of the increased selection against the con- 
ditional lethal allele is obvious: selection against the 
conditional lethal alleles in phase 1 is very strong and 
the dispersal of the conditional lethal alleles in phase 
2 causes little or no (depending on L) decrease in 
no-conditional lethal frequency before selection in 
favor of the no-conditional lethal type is dominants 
again in phase 3. For high L (16-20), the conditional 
lethal allele is almost completely removed from the 
population in the release generation. The efficacy of 
the release is diminished, with the lowest no-condi- 
tional lethal relative number attained being ^0.060 
(compared with 0.019 with a 5% cost per allele). The 
lowest no-conditional lethal frequency attained (for 
L = 4 with conditional lethal being lethal in the fourth 
generation) is 0.21 (compared with ^0 05 with a 5% 
cost per allele). If mortality is density-independent. 



then the low average fitness of the population and the 
resulting decrease in total population size somewhat 
ameliorates the effect of the low competitiveness of 
insects carrying conditional lethal alleles. 

The increased cost of the conditional lethal allele 
causes the optimal L values to be lower. For constant 
population size the optimal L is 2 if the conditional 
lethal allele becomes lethal in the F 2 generation and 

4 if the conditional lethal allele becomes lethal later. 
When mortality is density-independent, the optimal 
value of L is 4 regardless of the generation that the 
conditional lethal allele becomes lethal. 

Effect of Size of Release. 1:2 All-Male Release. The 
previous results were for releases of a size which gives 
an initial 2:1 ratio of released to wild individuals. Fig. 

5 a and b shows no-conditional lethal genotype fre- 
quency and relative number in a 1:2 all-male intro- 
duction with a 5% cost per released allele. The release 
is ineffective if the conditional lethal allele becomes 
lethal in the F 2 generation. This size of release can still 
be effective if the conditional lethal becomes lethal in 
the fourth or sixth generation, especially if the mor- 
tality is density independent (see Fig. 5b). 
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Fig, 5. Effect of size of release. Frequency and relative number of no-conditional lethal genotype insects in a 19:2 and 
1:2 release. 



19:2 All-Male Introduction. Fig. 5 c and d shows 
no-conditional lethal genotype frequency and relative 
number for a 19:2 all-male release. A large release with 
high L reduces the no-conditional lethal frequency to 
levels in the range of 10" T and lower. 

Comparing Figs. 3a, 5a, and 5c and 3a, 5b, and 5c, we 
see that the optimal L increases with the size of the 
release. This outcome is seen because the effective- 
ness of selection against the conditional lethal allele 
decreases as the no-conditional lethal genotype be- 
comes rare when release ratios are high. With low 



frequency of the no-conditional lethal genotype, there 
is little variation in fitness for selection to act on. 

Summary of the Effect of Conditional Lethal Allele 
Cost and Size of Release. Tables 2-4 show the (1) 
optimal L for the case of a population held at carrying 
capacity, (2) no-conditional lethal genotype frequen- 
cies at that L, (3) optimal L when mortality is density- 
independent, and (4) relative number of no- condi- 
tional lethal insects at the density-independent 
optimal L Optimal L values are calculated for the 
conditional lethal allele activation in the F 2 (Table 2), 



Table 2. Optimal loco* numbers and frequencies when the CL allele becomes lethal in the F 2 generation 



Cost 



Ratio of 
released:wild 



Optima L for 
constant population size 



Frequency of no-CL 
genotype at optimal L 



Optimal L 
( density-independent) 



Relative no. of no-CL 
insects at optimal L 

0.325 

0.064 
487 X 10 - * 

0.370 

0.087 
738 x 1(T 3 

044S 

0.123 

0.015 



0.025 

0.O5 

0.10 



1:2 

2:1 
19:2 
1:2 
2:1 
19:2 
1:2 
2:1 
19:2 



4 
6 
6 
4 
4 
6 
2 
2 
4 



0.362 
0.086 
8.30 x 10~ > 
0.450 
0.129 
0.017 
0.547 
0.240 
0.044 



4 

6 
8 
4 
4 
6 
2 
4 
4 



Column I: < 
of L Column 



Optimal L when population size is always at carrying capacity. Column 2: Frequency in the F 2 generation at that optimal value 
3c Optimal L when mortality is density independent. Column 4: Relative number in the F, generation at that optimal value of L. 
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F 4 (Table 3) , and F 6 (Table 4) generations for various 
combinations of release size and nonconditional fit- 
ness cost to the conditional lethal allele. 

Comparison of All-Male and Male-Female Releases. 
Fig. 6 compares no-conditional lethal genotype fre- 
quency, no-conditional lethal relative number, and 
relative population size for four types of releases: (1) 
a 2:1 all-male release, (2) aM male-female release. (3) 
a 4:1 male-female release (all with a fitness cost of 5% 
for conditional lethal alleles), and (4) a 4:1 male- 
female release with a conditional lethal allele cost of 
2.5%. L is set to 10 for all simulations of these releases. 
The total number of individuals in all of the 2: 1 releases 
are equal and the total in all of the 4:1 releases are 
equal. The 4:1 male-female release has the same num- 
ber of males as the 2:1 all-male release, but also has an 
equal number of females. 

2:1 Releases. The no-conditional lethal genotype fre- 
quency initially increases more in the all-male release 
than in the equal sized male-female release (as ex- 
plained in Appendix 2) After the F x generation, the 
separation between the all-male and male-female no- 
conditional lethal genotype frequency curves de- 
creases, but there continues to be a higher portion of 
no-conditional lethal genotype insects in the all-male 
release indefinitely. However, there is very little dif- 
ference between relative numbers of no-conditional 
lethal genotype insects in the two releases (Fig. 8b). 

Eqtial Numbers of Released Males. The 2:1 all-male 
release and the 4:1 male-female release have the same 
number of males; the difference is in whether the 



females are also released. Fig. 8 a and b shows that 
releasing the females brings substantial benefit in re- 
ducing the frequency of no-conditional lethal types: 
there is a fivefold difference in no-conditional lethal 
genotype frequency between the two releases and a 
15-fold difference in relative numbers of no-condi- 
tional lethal genotypes. 

In the field in the male-female release, the initial 
relative population size (cumulative average fitness) 
reflects the increase in the number of females. Thus, 
it is initially (before selection is applied) set at 3 in the 
2:1 release and 5 in the 4:1 release (Fig. 8c). Because 
L — 10, selection is strong against the conditional 
lethal allele. When mortality is density independent, 
population size drops quickly from the release gen- 
eration level. The rate of the decrease is determined 
by the fitness cost of the released allele. If there is a 5% 
cost, then the population size drops below the non- 
release size (cumulative fitness = 1) in one genera- 
tion. If the cost is 2.5%. then selection against the 
released insects is weaker, and the population size 
does not drop below the nonrelease size, until the 
fourth generation (see Fig. 8c). 

Two Types of Released Alleles. Test of Sensitivity 
of Results to Equal Fitness Cost Assumption. In the 
previous simulations it was assumed that the condi- 
tional lethal allele caused the same nonconditional 
decrease in the insect's fitness no matter where it was 
inserted within the genome. We use the two-allele 
type model to assess the importance of this assump- 
tion. We compare two releases: (1) a release with » 
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Optimal L (density- 


Relative no. of 


:mal L 


Cost 


released:wild 


population size 


genotype at optimal L 


independent) 


no-CL insects at optimal L 




0.025 
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0.247 


4-6 
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Column 1: Optima! L when population size is always at coming capacity. Column 2: Frequency in the generation at that optimal xalue 
of L Column 3: Optimal L when mortality is density independent. Column 4: Relative number in the F ft generation at that optimal value of L 
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Fig. 6. Comparison of all-male and male-female releases. The 2:1 all-male and 2:1 male-female releases are releases of the 
same size, but with different gender distributions. The 4:1 male-female release has the same number of males as the 2:1 all-male 
release, but includes the females not released in the all-male release, (a) Frequency of no-conditional lethal genotype. In 
the case of complete density-dependence, this is the number of no-conditional lethal relative to that with no release, (b) 
Relative number of no-conditional lethal genotype insects. (All-wild genotype frequency multiplied by cumulative average 
fitness.) In the case of density-independent mortality, this is the number of no-conditional lethal genotype insects relative 
to no release, (c) Cumulative average fitness. If mortality is density-independent, then this is the population size relative to 
no release, in a release with females, this is initially set to account for the increase in breeding population size brought by 
releasing females (see Methods). 



loci of fitness cost c t per allele and n loci of cost c* 
per allele , and (2) a rele ase with 2n loci with fitness 
cost l-VO^cjh-Ca). The cost function in (2) is 
midway between c 2 and c 3 and makes the all-condi- 
tional lethal genotype have the same fitness in both 
cases. Thus, we test how well a release with two allele 
types is approximated by a release with one allele type 
of the average effect Fig. 7 shows this comparison for 
L = 5 with allele costs of c t = 0% and c 2 = 10% versus 
allele cost of c, = c 2 = 0.0513 (as given by the above 
formula). The no-conditional lethal frequency is in- 
distinguishable between the two releases until at least 
the fifth generation, whereas the -no-conditional lethal 
relative number is indistinguishable indefinitely. 
Given that few insect species of interest hav e more 
than approximately six generations per year, this in- 
dicates that the assumption of equal fitness cost of 
conditional lethal alleles across loci has negligible ef- 
fect on the results. 



Optimal Locus Number for Two Allele Types. We can 
also explore effectiveness and optimal locus number 
for a release with two different allele costs. A typical 
situation might be that the conditional lethal allele has 
been successfully inserted at low fitness cost on a small 
number of loci and at higher fitness cost on other loci. 
It would then be useful to know how many of the 
higher cost loci to use in the release. Fig. 8 a and b 
shows no-conditional lethal genotype frequency and 
relative number for a 2:1 all-male release with a 0 cost 
conditional lethal allele on three loci and 0.05 cost 
conditional lethal allele on L 2 loci, where L> varies 
from 0 to 14. This release is very effective, achieving 
reductions of 98.5% in frequency and 99.2% in relative 
numbers of no-conditional lethal types by the fourth 
generation for optimal L. The no-conditional lethal 
genotype frequencies increase only slowly from the 
minimum brought about by the breakdown of gametic 
disequilibrium. Table 5 shows optimal values of 
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Fig. 7. Comparison of a release with 10 alleles of fitness cost 0.0513% each and a release with five alleles of 0 fitness cost 
and five alleles with 10% fitness cost each. These numbers were chosen to make the fitness of the all-released type equal in 
both releases. 



along with the frequencies and relative numbers of the 
no-conditional lethal genotype. The optimal L> is 0 
when the conditional lethal becomes lethal in the F 2 
generation, 4 in the F 4 generation, and 6 in the F 6 
generation for both extremes of density dependence. 
It is better to only use the three 0 cost loci if the 
conditional lethal allele becomes lethal in the F 2 gen- 
eration, but it becomes beneficial to use the higher 
cost loci if the conditional lethal allele becomes lethal 
in later generations. 

Having even a few very low cost loci is very ben- 
eficial if the conditional lethal allele becomes lethal in 
later generations. Compare Table 5 with Tables 2-4, 
which show data for the same release without the 
three loci with zero cost conditional lethal alleles. The 
benefit given by the three 0 cost loci is small if the 
conditional lethal allele becomes lethal in the F 2 gen- 
eration (phase 2) . Because breakdown of gametic dis- 
equilibrium is the dominant force for change in ge- 
notype frequencies at this stage, decreasing the fitness 
cost on a few loci has little effect. The decrease in 
fitness cost does have an effect in later generations 
when selection becomes more important. The benefit 
of the 0 cost loci is substantial if the conditional lethal 
allele becomes lethal in the F 4 and F fi generations. 



decreasing the number of surviving insects by factors 
of *=*2 and 4, respectively. 

Figure 8 c and d shows the same release, but with 
a 0.025 cost conditional lethal allele replacing the 0 
cost allele. Table 5 has optimal Lo and frequencies at 
optimal L 2 for this release. Again, it is best to use only 
the loci with low fitness cost if the conditional lethal 
allele becomes lethal in the F 2 generation but use the 
higher cost loci if lethality occurs in later generations. 
Again, the addition of the loci with low fitness cost has 
little effect until later generations. It is not until the F 6 
generation that the addition of the three 0.025 cost loci 
brings any substantial benefit over the release without 
these insertions. 

Effect of Laboratory Trait Fitness Cost. The pur- 
pose of this section is to compare releases in which the 
insects carry deleterious recessive traits resulting from 
laboratory rearing to releases in which insects are free 
of such traits. 

Figure 9 a and b shows a 2:1 release with a 0% cost 
conditional lethal allele on L loci and deleterious re- 
cessive laboratory traits on two loci. The fitness of 
individuals homozygous at both loci for the laboratory 
trait is set at 50% of the fitness for the same genotype 
without any laboratory trait alleles. For L - 10 in the 
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Fig. 8. Releases with a mix of higher and lower cost conditional lethal alleles, (a and c) A 2:1 release with insects carrying 
conditional lethal alleles with 0 fitness cost on three loci and 5% fitness cost on a varying number ofloci. (c and d) A 2:1 release 
with insects carrying conditional lethal alleles with 2.5% fitness cost on three loci and 5% fitness cost on a varying number 
of loci (number shown by curve). 



F 4 generation, the frequency of the genotype with no 
conditional lethal alleles (but laboratory trait alleles 
possible) is 4.7 x 1<T 3 and the relative number is 2.8 x 
10~ 3 . These are an order of magnitude higher than for 
the same release with no laboratory' trait costs (com- 
paring with L = 10 in Table 1). In the F 6 generation 
the frequency and relative number are 9.8 X 10~ 4 and 
5.2 X 10~ 4 , again an order of magnitude higher than 
the release with the no laboratory-associated traits. 

Figure 9 c and d shows the same release, but with 
a fitness cost of 80% for individuals homozygous for the 
laboratory trait on both loci. The no-conditional lethal 
genotype frequency and number are ^0.05 and 0.02 



for L = 10 in the F 4 generation. In the F 6 generation 
these are 0.02 and 7 X 10" 3 , respectively. 

Table 6 compares the reduction in wild population 
for SIT and conditional lethal releases (with L = 10) 
with 50 and 80% fitness reductions caused by labora- 
tory-associated traits. In the SIT release, the 50 and 
80% reduction in fitness increases the surviving geno- 
type population by factors of 1.6 and 2, respectively 
(comparing Tables 1 and 6). In the conditional lethal 
releases, the surviving population increases by an or- 
der of magnitude when fitness is reduced by 50% and 
two orders of magnitude when fitness is reduced by 
80%. The impact of genetically based laboratory fitness 



Table 5. Optimal locus numbers for a CL release with 3 loci with CL alleles that carry an unconditional fitness cost or 0*7<r or 2.5CV 
and L. loci with CL alleles tbat carry a 50r unconditional fitness cost 



Cost 


Ratio of 
released:wild 


Optimal L 2 for constant 
population size 


Frequency of no-CL 
genotype at optima! L» 


Optimal L 2 
(density -independent) 


Relative no. of no-CL 
insects at optimal L* 


0 


F* 


0 


0.065 


2 


0.062 




F, 


4 


0.015 


4 


0.007S 




F« 


6 


0.0084 


6 


0.0024 


0.05 


F 2 


0 


0.074 


0 


0.066 




F4 


4 


0.035 


4 


0.012 




F.i 


4 


0.019 


6 


0.0050 
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Fi" 9. Effect of fitness reduction associated with laboratory rearing, (a and b) 2:1 all-male release with a 0 fitness cost 
per relea^ed-type allele on L loci and 50% laboratory rearing fitness reduction determined by two recessive loci: The genotype 
has fitness 0.5 ifboth loci are homozygous for the laboratory trait allele, V05 if the genotype is homozygous for the laboratory 
trait on one locus, and one if neither, (c and d) 2:1 all-male release with a 0 fitness cost per released-type allele on L loci 
and 80% laboratory rearing fitness reduction determined by two recessive loci. 



costs on conditional lethal releases is greater than on 
SIT releases. 

Discussion 

Ideally, How Effective Can Multilocus Conditional 
Lethal Releases Be? The potential effectiveness of the 
conditional lethal release method is best judged in 
comparison with the sterile male release method. 
Mass-release strategies suffer from many complica- 
tions that we do not model here. Migration of insects, 



poor timing of release, weather conditions, and a mul- 
titude of other factors reduce the effectiveness of 
releases. Although we cannot model all of these fac- 
tors, we can look at SIT, see how well it has done in 
various circumstances, and compare the likely effec- 
tiveness of conditional lethal releases with it 

An "ideal" sterile male release — a release where the 
sterile insects compete equally with the wild ones- 
will reduce the population in the next generation by 
a fraction equal to the fraction that the sterile males 
make of the total male population. For example, a 2:1 



Table 6. 



Comparison of the effect of laboratory-associated fitness reductions on SIT release and on CL release 





Ratio of 
releused^wild 




Relative frequency of surviving genotype 




Generation 


SIT (50% fitness 
reduction) 


SIT (80% fitness 
reduction) 


L - 10 (50% fitness 
reduction) 


L = 10 (80% fitness 
reduction) 


Fi 

F 2 CDD 

DI 
F 4 CDD 

DI 
F fl CDD 

DI 


2:1 
2:1 

2:1 

2:1 


03 


0.71 


0.112 
0.0743 
4.67 x 10* } 
2.76 X 10- ;l 
9.81 X 10" 4 
3*0 X HT 4 


0.309 

0.144 

0.0507 

0.0215 

0.0184 

7.00 X KT 1 
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SIT release reduces the wild population to 0.2 of its 
original size. By comparison, a 2:1 ideal conditional 
lethal release with L ~ 18 reduces the population to 
0.04 of its original size if conditional lethal is lethal in 
the second generation, 0.000024 if conditional lethal is 
lethal in the fourth generation, and 9.1 X 10" 8 if 
conditional lethal is lethal in the sixth generation. 
Conditional lethal releases are potentially a far more 
powerful technique than sterile insect releases* Even 
small releases of size 1:6 can be effective if a higher 
(4-6) number of generations is available for the con- 
ditional lethal alleles to spread before thev become 
lethal. 

Examination of Table 1 shows that the effectiveness 
of conditional lethal releases will not greatly exceed 
that of SIT for species with a small number (1-2) of 
generations before the conditional lethal is triggered: 
three or more generations are needed for the trait to 
spread into the wild population. 

The modeling work of Klassen and coworkers (see 
Introduction for references) allowed up to three dif- 
ferent conditional lethal traits, each controlled by up 
to four loci with additive effects. One of the cases 
studied in Klassen, Creech, and Bell ( 1970a) , that of 
two or three conditional lethal traits controlled by a 
single autosomal locus each, is similar to what we have 
studied here. However, the potential for genetically 
engineering insect strains with a conditional lethal 
trait on many loci potentially makes the conditional 
lethal technique far more powerful than demonstrated 
by Klassen and coworkers. The work of Kerremans 
and Franz (1995) dealt with a conditional lethal gene 
on a single locus. They concluded that this was not 
effective compared with other strategies, but could be 
effective in combination with a Y-autosome translo- 
cation. However, even in combination with the trans- 
location, this technique is much less effective than 
what we have demonstrated here — again, because ge- 
netic engineering techniques should allow many loci 
to be used. 

What Effect Does a Fitness Cost of the Released 
Alleles Have? It is useful in comparing SIT and con- 
ditional lethal releases to separate fitness reductions 
caused by genetic manipulation from fitness reduc- 
tions caused by laboratory rearing. Undesired selec- 
tion by laboratory conditions is likely to be an ines- 
capable feature of all strategies that use large-scale 
release of insects. The conditional lethal release 
method, and all mass-release strategies, will suffer 
from this problem. However, damage to the insect's 
genome caused by the insertion of conditional lethal 
alleles is, at least in principle, under the control of the 
geneticist. It may be possible to minimize such damage 
by improving techniques or by screening many inser- 
tion events for those with the least fitness cost. 

The fitness reduction in SIT caused by irradiation 
(the "genetic manipulation" component) alone is on 
the order of 50-80% (see Introduction for references) . 
This percentage translates directly to the reduction in 
effectiveness of SIT. The picture for conditional lethal 
releases is more complicated, because the fitness re- 
duction depends on the number of insertions made 



into the insect's genome and because the selection 
occurs over multiple generations instead of just one. 

It is clear that the effectiveness of conditional lethal 
releases decreases rapidly as the fitness cost of inserted 
alleles increases. Verv small releases (e.g., 1:6 [see 
Table 1)) can reduce no-conditional lethal frequency 
to low levels when there is no fitness cost for inserted 
alleles. However, at a fitness cost of 5% per released 
allele, it takes a release on the order of 1:1 to get a 
useful reduction in no-conditional lethal frequency 
(see Tables 2-4). At a fitness cost of 10% per allele, it 
takes a release of size on the order of 10:1 (see Tables 
2-4) to achieve meaningful reduction in no-condi- 
tional lethal frequency. If mortality is density-inde- 
pendent (and thus relative number is meaningful) , the 
adverse effect of selection against released alleles is 
somewhat ameliorated by the increased genetic load 
incurred by the population as the cost of the released 
alleles increases. For example, examining Fig. 3c, we 
see that the population in the higher Leases is reduced 
to 30-40% of its original size by the F 4 generation 
without the conditional lethal allele ever becoming 
activated. Smaller releases might still be feasible at 
higher released allele cost if the density-dependence 
in mortality is weak. 

The results from the model with two types of con- 
ditional lethal alleles (Fig. 8) indicate that if a small 
number of insertions can be made with zero or low 
fitness cost, the effectiveness of the conditional lethal 
release method is increased — even with high fitness 
costs of insertions on other loci. The zero cost alleles" 
are selected against in the first few generations after 
the release, because of their genetic association with 
the higher cost released alleles. Once this association 
has broken down, however, the frequency of these 
alleles is largely untouched by selection and the cor- 
responding loci contribute indefinitely to keeping no- 
conditional lethal genotype frequency low. 

The results of Mackay et al 1992 (see Methods 
section) indicate that an average fitness cost of 5% per 
insertion is attainable (in fact, better than 5% is at- 
tainable because insertion events can be screened and 
some insertion events in the Mackay et al. 1992 study 
appeared to have no negative effect on viability). 
Assuming that geneticists are never able to do better 
than this, how effective would conditional lethal re- 
leases be? From Tables 2-4, we see that a 2:1 all-male 
conditional lethal release with a 5% fitness cost per 
conditional lethal allele reduces the population to 
1-12% of its original size, depending on the generation 
that the conditional lethal allele becomes lethal and on 
the degree of density dependence. By comparison, a 
2:1 sterile male release with a 50% fitness of sterile 
males only reduces the population to 33% of its original 
size. If the fitness cost per insertion can be reduced to 
2.5%, then the population can be reduced to well 
under 1% of its original size with one 2:1 release. In this 
case, conditional lethal releases are very powerful and 
it becomes feasible to do releases on a far smaller scale 
than the "overflooding" releases necessary with SIT. 

If there is a fitness cost for the conditional lethal 
alleles, the timing of the lethality of the conditional 
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lethal allele becomes important. The high selection 
against the conditional lethal allele in the first few 
generations (when the conditional lethal alleles are in 
high association with each other) keeps no-condi- 
tional lethal frequency up and, thus, conditional lethal 
releases are not particularly effective if lethality oc- 
curs in the F 2 generation. Beyond the F 2 , the break- 
down of linkage disequilibrium overcomes this selec- 
tion, and no-conditional lethal genotype frequencies 
drop rapidly. By about the seventh generation, selec- 
tion predominates again and no-conditional lethal ge- 
notype frequencies rise again. If mortality is density- 
independent, no-conditional lethal numbers don't 
rebound. Although no-conditional lethal genotype 
frequencies are increasing, the total relative popula- 
tion size is actually decreasing (because population 
average fitness is less than one). If there is substantial 
density-dependence in mortality, then no-conditional 
lethal numbers rebound rapidly after the seventh gen- 
eration. 

What is the Optimal Number of Loci at Which To 
Introduce the Conditional Lethal Gene? Optimal 
number of loci is highly variable, depending on the 
fitness cost of conditional lethal alleles, the size of the 
release, and the generation in which the conditional 
lethal allele will become lethal. Optimal locus number 
decreases with increased fitness cost of the condi- 
tional lethal allele, increases with increasing size of the 
release, and increases with a higher number of gen- 
erations before lethality. If lethality is in the second 
generation, the optimal locus number is in the range 
4-6, except if the cost of the allele is very high (e.g., 
10%). When lethality occurs in the F 4 and F 6 gener- 
ations, the optimal L is also 4-6 when conditional 
lethal allele cost is 10%. For an allele cost of 5% with 
lethalitv in generations F 4 and F 6 , optimal L is 4-8 for 
smaller releases (1:2 and 2:1) and 10 for the 193 
release. The optimal L is most variable for an allele cost 
of 2.5%, ranging from 8 to 20 when lethality occurs in 
the F 4 -F 6 generations. Optimal L is generally similar 
between the density-dependent and density-indepen- 
dent case, but occasionally higher in the density-in- 
dependent case. 

The picture becomes even more complicated when 
the fitness cost of the conditional lethal alleles varies 
between loci. However, comparisons of Table 5 with 
Tables 2-4 show that the optimal values for the total 
number of loci used are not much different between 
the examples with one allele type and two allele types 
(i.e., La +3 in Table 5 is close to the appropriate values 
of L in Tables 2-4). We do not know the extent to 
which this property applies in systems with more com- 
plicated variations in fitness cost 

How Much do Decreases in Field Fitness Caused By 
Laboratory Rearing Decrease the Effectiveness of 
This Technique? Maternal /Environmental Effects, 
Laboratory insects may have lower fitness because of 
direct environmental efFects (e.g., crowded rearing 
conditions) or maternal effects. Because these effects 
are typically not heritable, they should not affect the 
conditional lethal approach much differently than 
they affect SIT. 



Genetic Effects. Genetic load associated with labo- 
ratory strains decrease the effectiveness of the con- 
ditional lethal release method substantially. The no- 
conditional lethal genotype frequency increased by an 
order of magnitude for a 50% fitness reduction and by- 
two orders of magnitude for an 80% reduction in fit- 
ness. These compare with increases on the order of 
two- to fivefold for SIT. Because of a longer period of 
field selection against the released insects, laboratory 
rearing fitness reductions have a greater effect on 
conditional lethal releases than SIT. For the 50% lab- 
oratory rearing cost, the conditional lethal releases 
still achieve a much larger reduction in population if 
the conditional lethal allele becomes lethal in later 
generations. When the laboratory rearing cost is 80%, 
the conditional lethal release method is still superior, 
but the margin is less than an order of magnitude. 

Traditionally, the procedures for producing insects 
for mass release have emphasized quantity over qual- 
ity-. Given the theoretical potential for achieving major 
wild population reductions with much smaller re- 
leased populations than with SIT, it may be better to 
try for smaller but genetically higher quality release 
populations for conditional lethal releases. The effec- 
tiveness of the two methods converges as quality of the 
released insects decreases. 

How Much Difference in Effectiveness is There 
Between an All-Male Release and a Male-Female Re- 
lease? How is the Pest Potential of the Population In- 
creased by Release of Females? A laboratory population 
of insects cannot be raised without the females. Once 
it comes time for the release, there are two choices: 
devise some method for separating the females from 
the males or release the females along with the males 
and accept the increase in pest numbers. Separating 
females from males can be very costly, and is impos- 
sible in some cases (however, progress on genetic 
sexing systems has been made (Saul 1990, McCombs 
et al. 1993, McCombs and Saul 1995)). 

In SIT, release of females is detrimental to control 
if the presence of sterile females decreases the number 
of matings between sterile males and wild females. If 
mating is random, this will occur only if the available 
number of female matings exceeds the available num- 
ber of male matings. In a conditional lethal release, the 
release of females is beneficial to control as long as the 
mating is random, because it increases the frequency 
of the conditional lethal allele in the population. This 
is clear in a comparison of the 2:1 all-male and 4:1 
male-female releases in Fig. 6. The no-conditional 
lethal genotype frequency in generations 4-6 when 
females are released is on the order of one-fifth the 
size of the no-conditional lethal frequency when fe- 
males are not released. The difference in relative num- 
bers of no-conditional lethal types is on the order of 
10- to 20-fold. Thus, there is great benefit in releasing 
. the females. This benefit is decreased if, due either to 
behavioral or spatial factors, released males are more 
likely to mate with released females. Unlike a sterile 
release, however, matings between released individ- 
uals do not completely "waste** the released insects 
because their descendants can still mate with wild 
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insects. However, if selection is strong against the 
all-conditional lethal genoty pe, such matings will in- 
crease the rate at which the released allele is removed 
from the population. 

Releasing females would also be advantageous 
when the males in the target species are sexually 
selected. In such species, a small number of the most 
fit males get most of the available matings. Because 
released males are unlikely to be among the most fit 
males available, the fraction of matings involving re- 
leased males is far less than that expected with random 
mating. Thus, the effectiveness of the release de- 
creases if only males are released. However, it is ben- 
eficial if released females mate predominately with 
wild males. This minimizes the number of matings 
between high-conditional lethal genotypes, which are 
undesirable because of the low fitness of the offspring. 
Thus, release of females may be highly advantageous 
in species where females select male mates based on 
generic fitness components. This aspect of conditional 
lethal releases may be of great benefit relative to 
sterile male releases. 

Another aspect of mating biology with different 
impacts on SIT and conditional lethal releases is single 
versus multiple matings in females. SIT is less effective 
if females mate with multiple males. In conditional 
lethal releases, the number of matings per female has 
no impact on either all-male of male-female releases. 

An equally important consideration is the increase 
in the size, (and therefore pest potential) of the pop- 
ulation caused by the release of females. If the fitness 
cost of the. released-type alleles is high in a medium- 
sized release, then the genetic load introduced into 
the population quickly pulls population numbers be- 
low no-release levels. If the fitness cost of the released 
allele is low, then the population size can stay large for 
an extended period, and pest damage could be in- 
creased substantially (see Fig. 8c). 

Evaluating the pest damage caused by a release 
requires more than the simple analysis here. We only 
wish to make the point that the genetic load brought 
by the released alleles can in itself suppress population 
levels if density dependence is not strong, and may be 
beneficial in short term population suppression. 

Other Issues. Given the rapid progress of tech- 
niques for genetic transformation, it should be possi- 
ble to mass release genetically engineered insects in 
the near future. We have shown that the release of 
insects carrying a dominant conditional lethal trait on 
multiple loci has the potential to be several orders of 
magnitude more effective than sterile insect releases. 
However, we have also shown that the effectiveness of 
conditional lethal releases is strongly dependent on 
the quality of the released insects. "Quality" can be 
divided into (at least) two parts: the quality of the 
genetic alterations and the genetic quality of the lab- 
oratory strains of the target species. 

The effectiveness of conditional lethal releases de- 
creases rapidly as the noncondirional fitness reduction 
caused by the insertions increases. If genetic insertion 
techniques never become better than causing an av- 
erage 5% fitness reduction per insertion, then the 



effectiveness of conditional lethal releases is reduced 
to the order of a 5- to 10-fold superiority over SIT. 
However, if the conditional lethal trait can be intro- 
duced on even a few loci at a fitness cost close to zero, 
then the effectiveness increases. 

Quality problems in mass rearing are also a serious 
problem. Because of a longer period of field selection, 
conditional lethal releases are more affected by ge- 
netic load resulting from laboratory rearing than SIT 
is. This effect alone can reduce the effectiveness of 
conditional lethal releases to within one to two orders 
of magnitude of SIT. Given the possibility of doing 
conditional lethal releases with much smaller and 
fewer releases than SIT. it may be feasible to rear 
insects of a higher qualitv than has been typical with 

srr. 

Given the assumptions that went into this modeling 
work, a large number of theoretical issues remain. The 
most important perhaps, is nonrandom mating. As 
noted above, assortative mating behaviors can reduce 
the effectiveness of mass-release strategies. It is vital 
to understand the effect of such behaviors on the 
dynamics of the release. Spatial structure will also be 
important in species with very low or very high rates 
of dispersal. If dispersal is too low, then the released 
insects will remain in clumps and will not mate with 
wild insects. If dispersal is too high, then it will be 
difficult to maintain a sufficient density of conditional 
lethal-carrying insects in the target area. Dispersal 
rates will be a key factor in determining how large and 
often releases are needed with the conditional lethal 
release method. 

Given that we did not find a great difference be- 
tween the density independent and constant popula- 
tion size cases, population d\*namics and age structure 
within a season may not be of great theoretical or 
practical importance in developing conditional lethal 
release strategies. However, population dynamics be- 
tween seasons may be important. "A great deal of the- 
oretical work has been done on population dynamics 
and sterile male releases (see Ito et al. 1989 for a 
review), and most results should apply to conditional 
lethal releases. 

Throughout this article we have assumed that the 
allele for conditional lethality is dominant and that 
100% of individuals that have one or more copies of the 
allele will die once the conditional lethal trait is trig- 
gered. There is, therefore, an assumption that the 
phenomenon of gene silencing will not occur. Gene 
silencing occurs when individuals carrying multiple 
copies of a gene do not express the trait because of 
interference in the transcriptional or posttranscrip- 
tional process. The phenomenon of gene silencing has 
been documented in most detail for plants ( Vaucheret 
et al. 1998, Grant 1999), but it has also been observed 
in mice (Garrick et al. 1998) and Drosophila (Pal- 
Bhadra et al. 1997, Birchler et al. 1999). In some studies 
of insects, no gene silencing has been noted at all 
(Spralding and Rubin 19S3. Handler et al. 199S). and 
where it has been noted the "silencing" is more of a 
lowering of expression than it is a turning off of the 
gene (Pal-Bhadra etal. 1997). Therefore. lethal effects 
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could occur even with gene silencing if high levels of 
gene expression were not needed to interfere with 
survival. Additionally, studies of mice (Garrick at al. 
1998) andDrosophila (Pal-Bhadraetal. 1997) indicate 
that the activation depends on the number of gene 
copies present in a specific generation. Therefore, as 
long as most of the insects have only one or two copies 
at the time that the conditional lethal is triggered, it 
will not matter how many copies are in each of the 
initially released insects. 

In summary, conditional lethal releases will only be 
appropriate for certain insect species. New techniques 
for genetic manipulation of insects could greatly im- 
prove the efficiency of conditional lethal releases, but 
there will always be a need for close collaboration 
between molecular geneticists, insect ecologists, and 
pest management specialists if we are to gain the most 
benefit from this technology. 
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Appendix 1. Derivation of the Algorithm for 
Multilocus Mating and Selection 

The goal is to track the genotype frequencies when 
a laboratory population of insects carrying a condi- 
tional lethal trait at L loci is introduced into a w ild 
"population. We assume that the target species is dip- 
loid. It would be most advantageous to put each of the 
inserted genes on a different chromosome or linkage 
group because physical linkage would slow the rate at 
which the introduced genes penetrate the wild pop- 
ulation. We therefore assume that all of the introduced 
loci are, in fact, unlinked (i.e., recombination fre- 
quency = 0.5 between all loci). Second, we assume 
that selection acts the same way on each of the loci. 
Under these two assumptions all genotypes and ga- 
mete types with the same number of introduced alleles 
will behave exactly the same way, and, in particular, 
will always be equal in frequency. This means that we 
just have to track one variable for each possible num- 
ber of introduced alleles: L gamete types and Ir ge- 
notypes. The set of all gametes with the same number 
of conditional lethal alleles will be referred to as a 
gamete class. The set of all genotypes with the same 
number of loci in homozygous form for the condi- 
tional lethal allele and the same number of loci with 
no conditional lethal alleles is called a genotype class. 

We will write gamete types as vectors of I s and 0*s. 
where 1 denotes presence of the conditional lethal 
allele and 0 denotes absence of the conditional lethal 
allele. For example: 




represents a gamete with four loci of concern that has 
a conditional lethal allele on the first and third loci. 
The loci will be numbered 1 to L starting from the top. 
Genotypes will be represented similarly: 



Variables are defined as follows. L number of loci on 
which the conditional lethal allele has been intro- 
duced. Y; gamete class. A scalar with the value being 
equal to the number of conditional lethal alleles in the 
gamete class. P{ Y): The probability that a randomly 
chosen gamete is of class Y. Y x . Y*: Parental gamete 
classes. These are scalars, with the value being equal 
to the number of conditional lethal alleles in the ga- 
mete^class. [ Y u Y 2 h parental gamete classes in a mat- 
ing. X = [Xo, XJ: genotype class, where Xo = the 
number of (0 0) pairs in the genotype (scalar), X x — 
the number of (1 1) pairs in the genotype (scalar). 
P( I -^o, Xi ] :): the probability that a randomly chosen 
individual has genotype class [X^X,]. f([X^ XJ); 
relative fitness of genotype relative to the genotype 
with maximum fitness. 

& == 2 P(X)f(X): average fitness of the population. 

x 

*(X) =/(X)/ur: fitness of genotype [X^XjI relative 

to the population average fitness. 

We want the distribution of the Ys (gamete classes) 
in the next generation given the distribution now. The 
frequency of a gamete class in the next generation is 
obtained by summing the probability of that gamete 
class being produced by a given mating over all pos- 
sible matings in the current generation: 

P(Y(f + 1)) = 

L L 

£ 2 PWit + 1)IY,M, Yi(rt)P([y,. YJ). [A1J 

»l V.-I 

Where P(Y(i+l)l[Y,(t),Y 2 (t) j) is the probability of 
gamete class Y resulting from a union between paren- 
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tal gametes of class Y, and Y 2 . and P(Y h Y 2 ) » lne 
probability of a union between gametes of class Y, and 
Y, in the current generation. 

~To calculate P(H-l) I 1 Y, ,Y 2 J ) we must consider the 
distribution of genotype classes resulting from the 
union of gamete class Y t with gamete class Y 2 . Thus, 
P(Y(r+l)l[ Y l J 2 \ breaks down further: 

P(Y(i + l)\Y l (t) l Y 2 (f)) - 2 P(YIX)P(XI[Y,. YJ). 

* IA2J 

P(Y(f+l)] is then 
P(Y(t + 1)) = 

L L 

EES p(xi y„ p([y„ yj). [ A3i 

v,=i v>«i a 

We need to find expressions for the various parts of 
this equation. We can get P( Y) [X 0 ,X, ] ) easily. This is 
the distribution of gametes produced by an individual 
of genotype class [ X^X, | . X! tells us the number of ( 1 
1 ) pairs and Xo tells us the number of (0 0) pairs in the 
genotype. All of the remaining loci have either a (1 0) 
or a (0 1). The resulting gamete will always have a 
1-allele on loci with a (1 1) pair in the parental ge- 
notype and a 0-allele on loci with a (0 0) pair in the 
parental genotype. Because the recombination fre- 
quency is 1/2, then the number of 1-alleles arising (in 
the next generation gametes) from the ( 1 0) and (01) 
pairs are distributed as binomial (L-X l -X 0 , 1/2). 
L-X r X„ is the number of ( 1 0) and (0 1 ) pairs. We then 
have 

Y=X t + Z 

where Z is distributed as a binomial distribution with 
L-X r X 0 trials with h<k probability of success. 
Then 



P(Yl[X 0> X 1 ]) = {J (Z ' 



0 if X t > Y or X 0 > £ ~ Y 
Y-XJ else 



[A4J 



Under an assumption of random mating, we have 

P([ri.Y,J) = P(Y l )P{Y*J. [AS] 

Multiple matings per individual do not change any- 
thing provided that there is no sperm competition. 

The last piece that we need is P([Xo^C|]i( Y lt Y 2 ]), 
the distribution of offspring genotype classes given the 
parental gamete classes. If we knew the specific pa- 
rental gametes, then we would know the offspring 
genotype exactly. However, we only know the num- 
ber of 1-alleles on each parental gamete. Knowing only 
this, there are usually multiple possibilities for the 
resulting offspring genotype. For example, if we have 
an introduction on three loci and pairing between a 
Y - 2 gamete and another Y = 2 gamete, there are a 
number of possibilities as to what exact mating is 
occurring: 




In the first mating the parental gametes are the same 
on all three loci. In this case the offspring gamete is 
determined from the parental gametes, because there 
is only one possible allele at each locus.. In the second 
mating, the two gametes differ on the first two loci. 
Therefore, the offspring genotype is heterozygous on 
these two loci and gametes resulting from this geno- 
type can have either 1 or 0 alleles on both loci. Because 
the genotype is homozygous (1 1) on the third locus, 
alleles on the third locus in gametes produced by this 
genotype are all 1 -alleles. 

Only certain values for Xq and X, are possible for a 
. given pair of parental gamete classes. We now deter- 
mine what those are. Because Y, and Y 2 are inter- 
changeable, we can assume Y, < Y 2 without loss of 
generality. For purposes of labeling, assume that Yi is 
the paternal gamete and Y 2 is the maternal gamete. 
This also causes no loss of generality. Assume there are 
L loci. We will start by determining the minimum 
values of X Q and X t given Y, and Y 2 . We will number 
the loci so that the Y, 1-alleles on the paternal gamete 
are placed in locus positions 1 to Y t . Now, the mini- 
mum number of matching pairs (0 0 or 1 1) between 
the gametes occurs if the maternal gamete (with Y 2 
1-alleles) has its 1 -alleles placed on locus L and the loci 
counting back from (i.e., loci L-Y 2 +1 toL). If Y r +Y 2 > 
L, then there is an overlap between the 1-alleles on the 
two gametes, and there will be Y^Y 2 -L pairs of 
1-alleles. In this case there are no pairs of 0-alleles. If 
Y^ Y>. < L, then there is no overlap between 1-alleles 
and there are L- ( Y, + Y 2 ) pairs of 0-alleles. If Y t + Y 2 = 
L> then there are no pairs of 1-alleles and no pairs of 
O-alleles. 

Example 1: Take L = 6,Y t = 2. and Y 2 = 3. If we 
arrange the 1-alleles as described to get the minimum 
number of pairs, we have 



/1\ 




/0\ 


1 




0 


0 


and 


0 


0 


1 


0 




1 


\o/ 




\ll 



In this case there are no pairs of 1-alleles. but there are 
MY, + Y 2 ) = 6- (3+2) = one pair of 0-alleles. 

Example 2: Now take L = 6, Y, = 3, and Y 2 = 5. We 
again arrange the 1-alleles as described above. Now 
there is an overlap between the 1-alleles on Y, + Y 2 -L = 
3+5-6 = 2 loci: 
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equal to Y, and the number of (0 0) pairs is equal to 
L-Y 2 (recall that Y t < Y 2 ). 

Example 3: Take L = 6 ? Y, = 2, and Y 2 = 3 ? as in 
example L Now assume that the specific gametes are 
as follows: 



L~Y X of the Y 2 maternal l-alleles can be paired with 
paternal 0-alleles. The remaining Y*-(L-Y X ) maternal 
l-alleles must be paired with paternal 1 -alleles. 

The relations above give the minimum possible 
number of pairs given the union of gametes of classes 
Y x and Y 2 . The next lowest number of pairs occurs if 
we exchange a 0 on the paternal gamete that is 
matched with a 1 on the maternal gamete with a 1 on 
the paternal gamete that is matched with a 0 on the 
maternal gamete. The resulting paternal gamete will 
be of the same gamete class, but there is now a new (0 
0) pair and a new (11) pair in the resulting genotype. 
Every time we switch alleles like this we create a (0 0) 
and a (1 1) pair. 

The pairing of gametes in classes Y x and Y 2 that 
produces the maximum number of pairs occurs when 
the maternal gamete has l-alleles on loci 1 to Y 2 (with 
the 1-alleles on the Y x gamete still arranged as de- 
scribed above). Then the number of (1 1) pairs is 



/1\ 




/1\ 


1 
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0 
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and 
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0 
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\o/ 



The number of (11) pairs is Y, « two and the number 
of (00) pairs is L-Y 2 = 3. 

In summary, if Y x + Y><L, then X x ranges from 0 to 
Y x and X 0 ranges from L - ( Y x + Y 2 ) to L-Y 2 . If Y x + Y 2 >L, 
then X x ranges from (Y,+Y 0 )-L to Y, and Xq ranges 
from 0 to L-Y 2 . [A6] 

Thus, for any given [ Y l ,Y 2 ] we have determined the 
set of possible values for X 0 and X v 

The last thing that we need is an expression for 
P(IXoJCiII[Y |f Y 2 |) when [X 0 ,XJ is possible given 
[ Y,,Y 2 ]. This requires application of simple combina- 
torics: 



P([Xo, Xj]\[Y u Yj) = 



(number of ways \ / number of ways to \ / number of ways to \ 
to arrange the (II) j I arrange the (00) pairs f arrange the (01) pairs j 
pairs on L loci / \ on the remaining loci / \ on the remaining loci / 

(The total possible number of ways 
to make an L - locus genotype out 
of Yi and Y 2 parental gametes 

V <*0 / \ 'I A| / 



[A7] 



Note that there is another factor, number of ways to 
arrange the (0 1) pairs on the remaining loci, which 
equals one and is not therefore shown. By including 
this term and writing things out in terms of factorials 
we can show the symmetry of this expression with 
respect to Yl and Y2. The numerator is the total num- 
ber of ways to get an [X0,X1 \ genotype from parental 
gametes Yl and Y2. The denominator is the total num- 
ber of genotypes that can be made from parental 
gametes of classes Yl and Y2. 

Returning to equation A3, we have all of the pieces 
(equations A4 t A5, A7 and the set of possible values of 
X0 and XI given by A6), and the recursion for P{ Y,t) 
can programmed in a straightforward manner. This 
algorithm has been tested against the standard two and 
three locus models with selection. 



2:1 All*Male Release. Assume that the wild popula- 
tion has 30 males and 50 females. The release popu- 
lation has 200 males. The frequency of the no-condi- 
tional lethal genotype immediately after the 2:1 
release is then 33% and the frequency of all-condi- 
tional lethal genotype is 67%. There is a total of 250 
males, with 80% being released-type and 20% being 
wild type. The gametic disequilibrium of the no-con- 
ditional lethal (CL) gamete type in the release gen- 
eration is 

D - (freq. of no - CL gamete type) - 

(l-CLgenefreq) L [AS] 



D = 0.33-0.33 1 



(A9) 



Appendix 2. Example of an All-Male and Male- 
Female Releases 

To illustrate the details of releases, we give an ex- 
ample of an all-male and a male-female release. 



No Selection. If there is no selection against the 
released type insects, then the release generation mat- 
ings will be 40 matings of type (wild female X released 
male) and 10 matings of type (wild female X wild 
male). 
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The offspring will then be 80% of genotype AaB- 
bCc — and 20% of genotype aabbcc. . . , where capital 
letters denote u conditional lethal allele and lowercase 
letters denote the absence of a conditional lethal al- 
lele. The frequency of the conditional lethal allele is 
0.5 X 0.8 = 0.4. The gametic disequilibrium in the 
gametes going into the F, generation is 

D = (freq. of no — CL gamete type) — 

(1 — CL gene freq) L 

D = (0.8*0.5 + 0.2)-0.4 L = 0.6-0.6*- [ A10J 

It is clear that A10 is greater than A9 for all L. This 
happens because all no-conditional lethal females 
mate and therefore the proportion of no-conditional 
lethal gametes that go into matings is high. From this, 
we can understand the increase in gametic disequi- 
librium in the Fl generations seen in Figs. 2b and 3d. 

Selection Against Conditional Lethal Allele. If we 
define tv CL as the fitness of the all-conditional lethal 
genotype and w no as the fitness of the no-conditional 
lethal genotype, then the matings in the release gen- 
eration are as follows: (50) (0.8) (it* CL ) matings of type 
(wild female X released male) and 50(1-0.8) ic tto mat- 
ings of type (wild female x wild male). 

With no selection, the frequency of the no-condi- 
tional lethal genotype dropped from 0.33 to 0.20 be- 
tween the release and Fl generations (see Fig. 2a). If 
w CL is sufficiently small then (1-0.8) iu no will exceed 
0.33, and we will see an increase in the no-conditional 
lethal genotype frequency in the Fl generation. This 
increase will be greater for larger L because de- 



creases with increasing L (as reflected, for example, in 
Figs. 3a and 5a). 

2:1 Male-Female Release. The male-female release 
is simpler. We again assume that the wild population 
consists of 50 males and 50 females. Then the released 
population has 100 males and 100 females. 33% of each 
gender is wild and 67% of each gender is released. The 
release generation matings are as follows: 
(50)cc IM (0.67)av ;A (wild female X released male). 
(50) w, u , ( 1-0.67) w„„ (wild female X wild male) T 
(100) w CL (0.67) w CL (released female X released 
male), and (100) w Ch (1-0.67) w, M (released female X 
wild male). 

If there is no selection against the conditional lethal 
allele (thus tc tlo = w CL - 1), then the fraction of 
no-conditional lethal genotypes will be 50(1-0.67) / 
300 ~ 0.05. Recall that the all-male release had a 
no-conditional lethal genotype frequency of 0.2 in the 
F t generation with no selection. However, note that 
the number of matings producing the no-conditional 
lethal genotype is higher in the male-female release 
than the all-male release (16 versus 10). The fre- 
quency is lower because the total number of matings 
is higher when females are released. This explains why 
the no-conditional lethal frequency is higher in the F l 
generation in the all-male release in Fig. 8 than in the 
male-female release. 

If there is selection against the conditional lethal 
allele, then the frequency of the no-conditional lethal 
genotype in the F l generation will be (50/ 
300) ("-'»«) (1-0.67). This increases as increases (be- 
cause w nt) increases and w CL decreases). 
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ABSTRACT 

Four genes encoding the major egg yolk polypeptides of the Mediterranean fruit fly Ceratitis 
capitata, vitellogenins 1 and 2 (VGl and VG2), were cloned, characterized and partially sequenced. 
The genes are located on the same region of chromosome 5 and are organized in pairs, each encoding 
the two polypeptides on opposite DNA strands. Restriction and nucleotide sequence analysis indicate 
that the gene pairs have arisen from an ancestral pair by a relatively recent duplication event. The 
transcribed part is very similar to that of the Drosophila melanogaster yolk protein genes Ypl, Yp2 and 
Yp3. The Vgh genes have two introns at the same positions as those in Z>. melanogaster Yp3; the Vg2 
genes have only one of the introns, as do D. melanogaster Ypl and Yp2. Comparison of the five 
polypeptide sequences shows extensive homology, with 27% of the residues being invariable. The 
sequence similarity of the processed proteins extends in two regions separated by a nonconserved 
region of varying size. Secondary structure predictions suggest a highly conserved secondary structure 
pattern in the two regions, which probably correspond to structural and functional domains. The 
carboxy-end domain of the C. capitata proteins shows the same sequence similarities with triacylglycerol 
lipases that have been reported previously for the D. melanogaster yolk proteins. Analysis of codon 
usage shows significant differences between D. melanogaster and C. capitata vitellogenins with the 
latter exhibiting a less biased representation of synonymous codons. 



THE major egg yolk proteins (vitellogenins) of 
higher Diptera are polypeptides of 44,000 to 
50,000 daltons and differ from those of other egg 
laying animals. In contrast, the vitellogenins from 
species as diverse as the locust, nematode, frog and 
chicken probably have a common evolutionary origin. 
They are generally larger in size, are encoded by 
multigene families and share amino acid sequence 
similarities; no significant sequence similarities can be 
detected between them and the dipteran yolk proteins 
(Spieth et ai 1985; Nardelu et al 1987). The latter 
show local amino acid sequence similarity with mem- 
bers of the triacylglycerol lipase family (Bownes et 
al 1988). 

The best studied dipteran vitellogenins are the 
three yolk proteins of Drosophila melanogaster. These 
polypeptides, designated YP1, YP2 and YP3, are syn- 
thesized in the fat body of adult females, secreted in 
the hemolymph and taken up by developing oocytes. 
In addition, D. melanogaster vitellogenins are synthe- 
sized in developing follicular epithelial cells and trans- 
ported directly into the oocyte (Bownes and Hames, 
1978; Warren and Mahowald, 1979; Brennan et 
al 1982). Their transcription is stimulated by ecdy- 
steroid hormones; £-ecdysone induces yolk protein 
synthesis in the fat body of adult males, which nor- 
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mally do not produce yolk proteins (Postlethwait, 
Bownes and Jowett 1980). The three proteins are 
highly related to each other and are encoded by single- 
copy genes (Ypl, Ypl and Yp3) which are localized on 
the X chromosome. Genes Ypl and Yp2 are transcribed 
divergently and are separated by 1.2 kb of DNA, 
while Yp3 is located approximately one megabase away 
(Hung and Wensink 1983; Garabedian et al 1987; 
Yan, Kunert and Postlethwait 1987). Tissue-spe- 
cific transcriptional enhancers common for both genes 
are localized in the Ypl/Yp2 intergenic region (Gar- 
abedian, Hung and Wensink 1985; Garabedian, 
Shepherd and Wensink 1986). A very similar ar- 
rangement was described for the vitellogenin genes 
of Drosophila grimshawi, a Hawaiian endemic species 
which belongs to a different sub-genus than D. mela- 
nogaster. This species has three genes which cross- 
hybridize to each other and to the melanogaster Yp 
genes; SI nuclease analysis has shown that two of the 
genes are closely linked and transcribed with opposite 
orientations with their 5' ends 1.75 kb apart (Hat- 
zopoulos and Kambyselus 1987). 

The D. melanogaster vitellogenins are characterized 
by three regions which show extensive primary arid 
secondary structure homology among the genes but 
not between each other, separated by a non-conserved 
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regions with variable length (Hung and Wensink 
1983). One of the conserved regions has significant 
sequence similarity with part of the lipid-binding do- 
main of triacylglycero! lipases; it has been suggested 
that this region has a lipid-binding function (Bownes 
et al. 1988; Persson et al. 1989). 

The vitellogenins of Ceratitis capitata have been 
studied to considerable detail. This species has two 
major yolk polypeptides, designated VG1 and VG2, 
with molecular weights of 49,000 and 46,000 daltons 
respectively. They have been purified and show im- 
munological cross-reactivity with the Z). melanogaster 
homologs (Rina and Mintzas 1987, 1988). As in D. 
melanogaster (Kozma and Bownes 1986), they are 
synthesized in the fat body and follicular epithelial 
cells and are induced in males by 0-ecdysone (Rina 
and Mintzas 1988). 

The Mediterranean fruit fly C. capitata (family Te- 
phritidae; medfly) is a higher dipteran which presents 
several important advantages as an organism of choice 
for comparative molecular studies with D. melano- 
gaster. It is phylogenetically close enough to D. mela- 
nogaster to allow cloning of Drosophila gene homologs 
by interspecific nucleic acid hybridization, but distant 
enough for comparisons to be meaningful. Further- 
more, it has been adapted easily to inexpensive labo- 
ratory culture, it has a 24-day life cycle, and has well 
characterized polytene chromosomes (Zacharopou- 
lou 1990). Last but not least, medfly is an insect of 
economic importance amenable to biological control 
such as the sterile male technique. Cloning of two 
chorion protein genes and one actin gene from C. 
capitata was reported recently (Konsolaki et al 1 990; 
Tolias et al 1990; Haymer et al 1990). Of these, 
the actin gene and one of the chorion genes (CcsS6) 
were cloned by heterologous hybridization to D. mel- 
anogaster probes, while chorion gene Ccs38 was cloned 
by a differential screening procedure; gene CcsS8 was 
subsequently shown to cross-hybridize with the D. 
melanogaster s38 gene. 

In an effort to obtain and analyze C. capitata pro- 
moters that are expressed in only one sex, we cloned 
the genes encoding vitellogenins. Here we report the 
structure of these genes and the results of DNA se- 
quence analysis. 

MATERIALS AND METHODS 

Flics and materials: A C. capitata strain obtained from 
A. Mintzas (Department of Biology, University of Patras, 
Greece) was used for all experiments. The strain was origi- 
nally established in the laboratory by P. A. Mourikis (Be- 
nakeion Institute of Phytopathology, Athens, Greece) with 
flies from the Southern Peloponnese (Greece) and Palermo 
(Italy). Insects were raised at 22-25° as described previously 
(Mintzas et al 1983). Adults were maintained on a one 
part sucrose to one part dried yeast diet. Under these 
conditions, embryonic development lasts about 48 hours, 



and the complete life cycle of the insect is approximately 24 
days. 

*P-Labeled nucleotides were from Amersham, and 
restriction and modification enzymes from MinoTech 
(Heraklion), Pharmacia and Bethesda Research Laborato- 
ries. 

Construction and screening of the genomic library: 

DNA (20 fig) from 24-hr-old medfly embryos was partially 
digested with restriction endonuclease Mbol, extracted with 
phenol/chloroform, precipitated with ethanol, and fraction- 
ated on a 10-40% sucrose gradient. Fractions containing 
fragments of 15-20 kb in length were retained for ligation 
to vector DNA. For ligation, 0.25 u% genomic DNA frag- 
ments were combined with 0.8S fig lambda EMBL4 arms 
produced by digestion of phage DNA with BamHL In vitro 
packaging was as described previously (Maniatis, Frttsch 
and Sambrook 1 982). Approximately 250,000 plaques were 
screened, as described by Benton and Davis (1977), by 
hybridization at 55° to a D. grimshawi cDNA clone (Hat- 
zopoulos and Kambysellis 1987) corresponding to the 
vitellogenin 1 mRNA of this species. This clone was used 
because it was available to us and had been shown to cross- 
hybridize strongly with its D. melanogaster homolog. 

General methods: Genomic DNA was prepared essen- 
tially as described previously (Holmes and Bonner 1973). 
Preparation of phage and plasmid DNA, agarose gel elec- 
trophoresis of DNA, and blotting to nitrocellulose mem- 
branes were carried out using standard procedures (Man- 
iatis, Fritsch and Sambrook 1982). DNA probes were 
prepared by nick-translation (Maniatis, Fritsch and Sam- 
brook 1982) or by random hexanucleotide priming (Fein- 
berg and Vocelstein 1983). Hybridizations of 32 P-labeled 
probes to blotted nucleic acids were performed as described 
by Maniatis, Fritsch and Sambrook (1982), at 38° (Z>. 
grimshawi probe) or 42° (C. capitata probes) in 50% form- 
amide, 5X SSC, 0.5% SDS, 10 mM EDTA, 100 pg/ml 
sonicated, heat-denatured herring sperm DNA, and 5x 
Denhardt (1966) solution. DNA sequencing was done by 
the double stranded dideoxy chain termination method 
(Wallace** al 1981). 

DNA and protein sequence analysis: The program pack- 
ages ANALYSEQ and ANALYSEP (Staden 1984) were 
used for sequence analysis. Optimal alignment of protein 
sequences was carried out by the IALIGN program (Day- 
hoff, Barker and Hunt 1983) of the Protein Identification 
Resource (National Biomedical Research Foundation) and 
by the multiple alignment program CLUSTAL (HlGGINS 
and Sharp 1988). All programs were run on a VAX/VMS 
computer. Graphics were processed and plotted with a Mac- 
intosh microcomputer running a terminal emulation soft- 
ware. 

RESULTS AND DISCUSSION 

Cloning of C. capitata vitellogenin genes: Figure 
1 shows the results of a hybridization experiment in 
which vitellogenin DNA from another higher dip- 
teran, D. grimshawi, was used to probe a C. capitata 
genomic DNA blot at low hybridization stringency. 
The probe was a cDNA clone (plasmid clone c357) 
which contains approximately 900 nt. corresponding 
to the carboxy terminus half of the Vitellogenin 1 
mRNA from D. grimshawi (Hatzopoulos and Kam- 
bysellis 1 987). At least four prominent Hindlll frag- 
ments and two EcoRl fragments were detected by this 
probe in C. capitata DNA. The same probe hybridized 
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Figure 2. — Restriction maps and 
arrangement of C. capitata genomic 
clones containing vitellogen in-re- 
lated sequences. Restriction maps of 
the four overlapping clones ccv72, 
ccv53, ccv51 and ccv7l are shown at 
the top. The sites shown are: BamHl 
<B). Sail (S), Hindlll (H) and EcoKl 
(E). The bars below clones ccv72 and 
ccv53 correspond to the fragments 
produced by double BamHl /Hindll! 
digestion that hybridize to the D. 
gnmshawi probe. The composite map 
of the cluster is shown below the 
clones. The four vitellogenin genes 
(or to 5) were identified by sequencing 
(see text). The restriction map of 
the non-overlapping clone ccv8I is 
shown below the size scale. 



script in preparation). We have concentrated our 
analysis on the genes of the a-6 cluster. 

Structure and chromosomal localization of the y 
and 6 genes; conserved features between C. capitata 
and Z>. melanogaster: Sequencing of the y and d 
regions showed that each contains a gene highly ho- 
mologous to the D. melanogaster yolk protein genes. 
Figure 3 shows the sequence of 2364 bp of genomic 
DNA covering the y region. The sequence extends 
from base no. +953 to base —1411 relative to the 
Hindlll site at 23 kb of the composite map shown in 
Figure 2. Conceptual translation in all six frames 
showed three open reading frames (all in the orienta- 
tion indicated in Figure 2) with significant similarity 
to the D. melanogaster yolk polypeptides, suggesting 
the presence of two introns. By aligning the derived 
polypeptide sequences to the available D. melanogaster 
yolk protein sequences we arrived at the intron/exon 
structure indicated in Figure 3. The first coding part 
begins with an ATG at base 433 of the sequence and 
ends at position 655, which is the first base of codon 
74. The nucleotide sequence surrounding the initia- 
tion codon (C A A C A T G) is in good agreement 
with the consensus sequence C/AAAA/CATG 
flanking translational start sites in Drosophila (Cave- 
ner 1987). A 67-bp intron separates the first coding 
part from a second, 389-bp exon beginning at base 
723. A second intron is placed between bases 1112 
and 1179. The 5' and 3' ends of both introns con- 
form to consensus sequences (G T A/G A G T ... 
YNYYYYNYAG) (Mount 1982; Teem et al 
1984); in addition, both introns contain versions of 
the internal splice signal C/T T A/G A C/T (Keller 
and Noon 1984) upstream from the 3' splice site. 
The third coding part is 699 bp long, ending with 
a TAA at base 1879. Two tandem repeats of the 
consensus polyadenylation signal sequence A A- 
T A A A A (Proudfoot and Brownlee 1976) are 



located 100 nucleotides downstream from the termi- 
nation codon. 

The 1702-bp sequence from the b region is shown 
in Figure 4. This sequence extends from base -1 137 
to base +565 relative to the BamHl site at approxi- 
mately 30.3 kb of the composite map shown in Figure 
2. The sequence contains two open reading frames 
which are read in the opposite direction from the 6 
gene and also show considerable sequence similarity 
to the D. melanogaster Yp geiies. The proposed intron/ 
exon structure of the gene is as follows. The first 
coding part is 21 1 bp long and begins with the ATG 
at base 237 of the sequence. This ATG is also em- 
bedded within a sequence similar to the Drosophila 
consensus (see Figure 6A). The exon ends at position 
447, which is the first base of codon 71, and is fol- 
lowed by a 89-bp intron with acceptable splice signals. 
The second coding part is 1055 bp long, ending with 
a TAA codon at base 1592. 

Although the proposed intron/exon structures of 
the y and h genes were not subjected to direct testing, 
such as SI protection experiments or cDNA sequence 
analysis, we believe that they represent the correct 
structures because of their striking similarity to the Z). 
melanogaster yolk protein genes. Z>. melanogaster has 
three genes, Ypl, Yp2 and Yp3, coding for the three 
yolk proteins found in this species. The structures and 
the arrangement of consensus sequences are very sim- 
ilar in these genes. Ypl and Yp2 have a short exon 
followed by a single short intron (75 bp in Ypl and 68 
bp in Yp2) and then by a longer exon. Yp3 has two 
short introns of 62 and 72 bp; the first is at the same 
position as the intron in Ypl and Yp2. The 5' consen- 
sus sequences of the three genes (TATA box, capping 
site and translation initiation consensus sequence) are 
also at very similar positions; in addition, all three 
genes have rather short (51 to 61 bp) 5' untranslated 
regions (Hung and Wensink 1983; Garabedian et 
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Figure 3. — Nucleotide sequence of the Vgl-y gene and sur- 
rounds. The predicted amino acid sequence is shown above the 
DNA sequence. The putative TATA box, cap site, initiation and 



al 1987; Yan, Kunert and Postlethwatt 1987). 
The two sequenced medfly genes have the following 
structural features in common with their D. melano- 
gaster homologues: 

Sequence alignment and position of introns: Removal 
of the two introns from the 7 gene gives an open 
reading frame of 131 1 nucleotides, which can encode 
a 437 amino acid polypeptide with a molecular weight 
of 48,1 22 daltons. Respectively, the 5 gene can encode 
a 422 amino acid polypeptide with a molecular weight 
of 45,434 daltons. Medfly vitellogenins 1 and 2 have 
molecular weights of 49,000 and 46,000 daltons, re- 
spectively (Rina and Mintzas 1987). Based on the 
good agreement between the calculated and the ob- 
served molecular weights, we suggest that the 7 gene 
encodes VG1 and the 6 gene encodes VG2. Figure 5 
is an alignment of these deduced polypeptide se- 
quences to the known sequences of the three D. mel- 
anogaster yolk proteins. The five sequences can be 
aligned with only two major gaps, which are located 
within exons. This alignment illustrates that the posi- 
tion of the introns is stricdy conserved in the five 
genes: Intron 1 of the Vgl gene and the single intron 
of the Vg2 gene are at the same position as the intron 
present in all D. melanogaster yolk protein genes, ie. f 
after the first base of a codon for tyrosine (Hung and 
Wensink 1983); intron 2 of the Vgl gene is at the 
same position as the second intron of the ZX melano- 
gaster Yp3 gene (Garabedian et al 1987). With re- 
spect to intron/exon structure, therefore, the Vgl 
gene appears to be homologous to the Yp3 gene, while 
the Vgl gene appears to be homologous to the Ypl 
and Yp2 genes. 

Size of introns: The two introns in the Vgl gene ^nd 
the single intron in the Vgl gene are, as in the D. 
melanogaster yolk protein genes (Hung and Wensink 
1983; Garabedian et al 1987), very short: 67 and 89 
nucleotides for intron 1 of the Vgl and Vgl genes 
respectively, and 68 nucleotides for intron 2 of Vgl. 
Figure 6B shows a comparison of intron sizes and 
consensus splicing sequences between the vitellogenin 
genes of C. capitata and D. melanogaster. 

5' Consensus sequences: The D. melanogaster yolk 
protein genes have, as most eukaryouc genes, a TATA 
(Hogness-Goldberg) box beginning at position —29 to 
-32 relative to the capping nucleotide (Hung and 
Wensink 1983; Garabedian et al 1987; Yan, Ku- 
nert and Postlethwatt 1987). In addition, the cap- 
ping sites of all three genes match the insect cap site 
consensus sequence (Hultmark, Klementz and 
Gehring 1986). The 7 gene has the sequence T A- 
T A T A A between bases -61 and —55 relative to 
the initiation ATG; the 5 gene has aTATAAAA 

termination codons, poiyadenylation signals, and first and last two 
bases of each intron are underlined. Numbers refer to nucleotide 
and amino acid positions. 
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WUmT6WTTCfiftTBCCnOTTTTflflTOflftCCGflTTTftCft6WCnTTRTTTTaTTCCT 60 
TfiOTCTft(XCfWTCTSTRCTT6T0TOTOTCCCTn0T(|T0CTTflTTCTflOTTCflTCflflT 1 20 
TnflCT8ffiIOTflTOCTrW^TnOCflTTTT8TCOTCOTTflCemCCftCTCflTO 1 00 



6€C€HTC6TflCGACRTflGGCGAT6TCCGGftl 



n i 

:iTWTCTGCCRIOfl 2tQ 



HPLT I FCIORULLSRATRHR 21 

«TCCTTT(»CTflTTnCTCTTT6$TGGCT0TCCTGCTCTCC6CCta^CR(mi»TC6C6 300 

OSMAIflHHLOPSGXLSPREL 11 

CCRGCfiftTGCGRTCCGCfiflCftSTTTCCRflCCCTCflGGCRKRCTTTCSCCIlCCTGBfiTTGS 360 

EOnPAIHEtTFCKlQEtlPAE 61 

RfiGmftT8CCR5C«flTRftOTGRGflTCflCCnC6flGftMrTGCfl6CMRT6Ctt 120 

ERAOLUNC 1 70 

RGGCTGCRGRTnRGTGRRCRRGRTCTG^RRGTRnGCATRGTTTRTnTTRCGCRGCGC 100 

V H 72 

CGRRCRCTTTRRGCGGCCGCCARRTRCRCRTRTGTRTGTRTGRRTTTrCTTTRCgG.RCCR 510 



LSOnSRRIEPSVRPSPNQ 
CTTGTCGCRGRTGRCTCGTflRTRTTGRRCCCRGTTRTGCflCCRRGCI 



I P 92 



RVTVTPTGQRUN 
CGCCT RCRCRTRCRCRCCCRCCG6 



F H L 



Q L U R 1 12 



CRCTGCTCWCRRCflflCC HFD,C0£9IMF,T6l,,a 

PHKSSRnLTfl«OCLU0RVLQ132 
GCCRRRCRRRRGCTCCGCGRTGCTGRCGGCCRRCCACRRGCTGGTRCRROCRTRCTTGCfl TOO 

RVRORVQUQCCQ600SHQOT 172 
AGCATftCRfK6GCC1ffGTRCftG6TflCflflGGTGA^^ 010 

S3$EES$MRPHGQQPKPf16HI92 
flTCRTCGRWDflGORRTCCTCt^CCGTCCRftRCGGTCflRCRGCCCRftflCCCflftTGGRRR 900 

L U U 1 DLGRUI RHFEOLUtLO 2t2 
TTTGGTRGT7flTCGftCTTGGGTGCC6TCflTCCGCftflCnCGflRGRrCTRGTTTTRCTCG« 960 

1 H ft 0 G ft A I6HSLVQLTA0AO232 
CfiTCRRTCCCGTCG6CGCTGCCflTCCGTRflCflGTnMT6CflflCTTRCBGCRCflTOCTGft 1020 



RRRQVTflpTGHTLRRlTflnO 272 
tGCTGCTCCTCfiflTRCflCRCCrMflfiCTGGTRRCflCTTTGCCCCGTflTCfiCCGCtRTGGfl 1110 

PSK I VARKPNTIV6LRRGNR 292 
TCCCTCRRRGRnTRTGCRCGTRRRCCCflRCRCTTTGGTCGGTTTGSCTCGTGGTRRTGC 1200 

D F U 0 ft IH7SAVGL6TTTAAG 312 
TGRTnCGTTGRTGCCRICCflCRCflTCTGCnflTGGTrTGGCIWTflCRRCflnGTGCTGG 1260 

OUOFVPHGPSOHnPGTOOl 1332 
TGRTGTTGRCTTCTATCCTRAT66CCCRTCTGTCRHTRTGCCCG6T8CTGArn!AGRTCRT 1 320 

ERSLRRTflVLRETULPGKDR 352 
TGRRGCCRGTnRCGTGCCRCTCGCTRCTTRGCCGRGRCRGTGCTGCCnGGCRRTGRCCG 1300 

NFPRURRESLQOVKHNNORO 372 
TRflCTTCC€RGCT6TTGCCGCT6RRTCGCTRCAflCRRTRCflRGAAT6RCRRTC6CRRTGG 1 110 

RRRVnO I RROVDLEOOVJ L 0 392 
CRGRCGCGCTTRTRTGGGTRTTGCTGCCGRCTflCGATTTGGRGGGTGRCTRCRTTCTCCR 1 500 

UHRKGPFGKSRPRQtQHSVH 112 
RGTORAC6CCnRGR6^CR7 TCGGTRRRR6CGXTCCT6CCCRGRRfltRGRRCTCCTRCCR 1560 

G I RQGRGRPH* 122 
TGGCRTRCRTC^SGTGCCGGCCGCCCTRRCIfiflGRCRRCTRGRCGGCTGCGGRRGRGTG 1620 
GARRGTGTRRTCRORRATACRRRTGGAnTCTTRTTTRRTnTRRCrTRRCTTRCTTTrT 1 600 
RTGGTRTTTGCRTTRGRRGCTT 1702 

Figure 4. — Nucleotide sequence of the Vg2-6 gene and sur- 
rounds. The predicted amino acid sequence is shown above the 
DNA sequence. The putative TATA box, cap site, initiation and 
termination codons, polyadenylation signals, and first and last two 
bases of the intron are underlined. Numbers refer to nucleotide 
and amino acid positions. 

between -109 and -103 from the ATG. Alignment 
of the five nucleotide sequences at the TATA boxes 
showed that each of the medfly genes has a capping 
site-like sequence at the canonical distance down- 



stream of its putative TATA box. The y gene has the 
sequence C A C A G T T and the 5 gene has the 
sequence TACAGTT 30 and 32 bases down- 
stream from the TATA, respectively. These hepta- 
nucleotides represent 5/7 matches to the consensus 
insect cap site A T C A G/T T C/T. These features 
are shown in Figure 6A. 

Size of the 5' untranslated regions: The D. melano- 
gaster yolk protein mRNAs have rather short 5' lead- 
ers, 61, 51 and 56 nucleotides for Ypl, Yp2 and Yp3, 
respectively (Hung and Wensink 1983; Garabedian 
et al 1987; Yan, Kunert and Postlethwatt 1987). 
If the putative capping sites identified are used by the 
7 and h genes, then their 5' sequences are also rather 
short, ca. 30 and 76 nucleotides, respectively. 

Similarities in the 5' flanking sequences: As in Z). 
melanogaster, the sequence homology observed in the 
coding parts of the C, capitata vitellogenin genes does 
not extend into the 5' flanking DNA. However, sev- 
eral short nucleotide sequences have been identified 
in D. melanogaster, which are repeated several times 
in the 5' flanking DNA of the yolk protein genes 
(Garabedian et al. 1987; Yan, Kunert and 
Postlethwait 1987). The heptamer A/TA/T 
T G C A A or its complement is encountered seven 
times within 800 bp upstream from the Yp3 gene, and 
five times in the 1.2 kb intergenic region between Ypl 
and Yp2 (Yan, Kunert and Postlethwait 1987). 
Matches to this heptamer or its complement are found 
at positions -88 and -343 (relative to the first T of 
the TATA box) of the y gene, and at positions —62 
and —72 of the 5 gene (Figure 6C). A comparison 
between D. melanogaster and C. capitata flanking se- 
quences also showed the presence of single copies of 
the sequence GAGNTCAA G/T GpT C G/C at 
distances from -575 to -124 relative to the TATA 
box in the Yp2, Yp3 f y and 6 genes (Figure 6D). The 
consensus, and indeed three of the four actual se- 
quences, are close relatives (9 of 12 nucleotides) to 
the sequence GGGTTCAATGCA found at 
the ecdysone responsive element of the D. melano- 
gaster hsp27 gene promoter (Riddihough and Pel- 
ham 1987). Although the significance of these simi- 
larities is not known, transformation experiments in 
which in vitro modified medfly genes are introduced 
into the D. melanogaster genome would test the possi- 
bility that they represent regulatory elements con- 
served between D. melanogaster and C. capitata. 

Another similarity between Drosophila and medfly 
vitellogenin genes is their position in the genome. In 
Drosophila these genes are on the X chromosome, 
while in medfly they are located on chromosome 5. 
The X chromosome of the medfly is heterochromatic 
(Zacharopoulou 1990), and recent in situ hybridi- 
zation studies of medfly polytene chromosomes have 
revealed that several medfly genes homologous to 
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Figure 5. — Comparison of the amino acid sequences of the three 
D. melanogasUr yolk polypeptides to the deduced C. capita ta vitel- 
logenin polypeptide sequences. YPl, D. melanogasUr yolk protein 1 
(NBRF database accession No A03332); YP2, D. melanogasUr yolk 
protein 2 (NBRF database accession No A03333); YP3, D. melano- 
gasUr yolk protein 3 (sequence from Garabedian et al 1987); VG1 



Drosophila X-linked genes (including the vitellogenins 
and the two chorion genes $36 and s38) are located 
on chromosome 5 (A. Zacharopoulou, personal 
communication). These syntenic associations, com- 
bined with those discovered for other medfly genes 
by Malacrida et al (1986), further support Mutter's 
hypothesis about the evolution of the Diptera 
(Muller 1940) and add to the significance of com- 
parisons between medfly and Drosophila. 

The or and 0 genes have arisen from a recent 
duplication of the y~6 pair: The restriction map of 
the cc-fi region shown in Figure 2 suggests that this 
region represents an inverted duplication of the y-6 
region. This was confirmed by partially sequencing 
the a and 0 loci. 

Two parts of the a locus (429 and 565 bp) were 
sequenced. The first part is identical to bases 1 to 429 
of the 5 gene, with three differences: a 202A — » G 
substitution at the 5' untranslated region, a 260T — » 
C conservative substitution at codon 8, and 363G — * 
A, which results in a replacement, 43Asn — * He. The 
second part is identical to bases 1 138 to 1702 of the 
7 sequence, with three differences: A 1330T — ► C 
conservative substitution at codon 335, and two A — > 
C changes at bases 1593 and 1594; these changes 
replace the TAA termination codon with a codon for 
serine, which is then followed by codons GAC, AAC 
and a termination codon, TAA. As a result, the car- 
boxy terminus of the polypeptide coded by the a gene 
differs from that of the 6 gene product by an extra 
SerAspAsn tripeptide. 

Similar results were obtained by partially sequenc- 
ing the 0 locus. The sequenced part (992 bp) is more 
than 98% identical to nucleotides 1 130 to 2121 of the 
7 gene. Of the thirteen differences found, five are in 
intron 2 (1134A -» T, 1153A G, 1157G A, 
1 168T C, 1 174C -» G), seven are located down- 
stream from the termination codon (197 IT — ► G, 
2056A G, 2069C G, 207 IT -p C, 2074T -» C, 
2093C T, 2110G -» A) and only one occurs in 
exon 3 (1187 A — > T, resulting in a replacement, 
206Asn-*Ile). 

We conclude that the a-0 and y-5 pairs of genes 

and VG2: C. capitate vitellogenin I and 2, (genes y and S) respec- 
tively. The five proteins were aligned using the program CLUSTAL 
(Higcins and Sharp 1988). The positions of the gaps were first 
determined by two-way comparisons using program IALIGN (Day- 
HOFF, Barker and Hejnt 1983) and then adjusted manually for 
maximum similarity. Residues which are identical in at least four of 
the sequences are boxed. The position of the introns is indicated 
by filled triangles (intron 2 is found only in D. melanogasUr Yp3 and 
in C. capita ta Vgl genes). The similarity of these sequences to the 
pig triacylglycerol lipase (NBRF database accession No. A00732) is 
shown above the YP1 sequence. Capita] letters show identities 
between lipase and at least two of the vitellogenins; lower-case 
letters indicate conservative replacements. Numbers indicate amino 
acid positions. Three small gaps in the lipase/vitellogenin alignment 
are not shown. 
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Figure 6.--$equence similarities at noncoding regions in the vitellogenin genes of Z>. melanogaster and C. capitate. A, regions of nucleotide 
sequence similarity around the transcription and translation initiation sites. B, Comparison of the splice sites and intron sizes. C f Occurrences 
of the repeated heptamer (see text). D, Matches to the D. melanogaster hsp27 ecdysone response element. For comparison, nucleotide positions 
in C and D arc relative to the first T of the TATA or putative TATA boxes. 



have arisen from a relatively recent duplication event. 
The data are not sufficient to rule out the possibility 
that a and 0 are pseudogenes. The findings, however, 
that the potentially coding regions have diverged less 
than the noncoding regions, and that all parts of the 
partial open reading frames which have been se- 
quenced in a and 0 (450 codons in total) are free of 
stop codons, strongly suggest that these genes code 
for variants of the vitellogenins 2 and 1 respectively. 
This is also supported by sequence analysis of vitello- 
genin cDNA clones from medfly ovaries (K. Paliak- 
asis and C. Savakis, unpublished results). 

Z>. melanogaster and C. capitata vitellogenin pro- 
teins show extensive sequence and structural con- 
servation: A striking degree of conservation between 
the Z). melanogaster and the C. capitata vitellogenins is 
revealed when the five amino acid sequences are 
aligned for maximum similarity. This conservation 
pertains to primary sequence, hydrophobicity patterns 
and predicted secondary structure (Figures 5 and 7). 
For comparison, we divide each sequence into five 
regions (a to e in Figure 7): 

The conserved amino terminal region of all the 
proteins (region a) is 19 or 20 residues long and 



hydrophobic (Fig. 7, bottom); we conclude this is the 
signal sequence for secretion (Blobel and Dobber- 

STEIN 1975). 

Region b, corresponding to residues 26 to 159 of 
YP1, is characterized by a low degree of sequence 
conservation: Twenty six residues (19%) are invariant 
in all five proteins. However, there are virtually no 
gaps in the alignment (only a two-residue deletion in 
YP1 ) and there are several conservative replacements, 
which result in a pronounced conservation of second- 
ary structure and hydrophobicity patterns (Figure 7). 
A short region between regions a and b cannot be 
aligned without the introduction of insertions/dele- 
tions. 

Region c corresponds to amino acids 160 to 201 of 
YP1, and shows no apparent conservation, with the 
exception of a SerSerGIuGlu sequence shared by all 
proteins. This region varies in size, from 42 amino 
acids in YP1 to 33 amino acids in VG2. It contains 
many amino acids with charged and polar side chains 
(Figure 7, bottom), but does not seem to be conserved 
at the secondary structure level. 

Region d is the most conserved one. It is 228 amino 
acids long, spanning residues 202 to 427 of YP1. The 
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D. meianogoster vitellogenins. This drawback, which is 
probably caused by the presence of multiple vitello- 
genin genes in the genome, could be overcome by 
introducing in vitro mutagenized gene copies into the 
germline and studying their effects on the wild-type 
genes, as originally suggested by Herskowttz (1 987). 

From an evolutionary point, it is remarkable that 
the vitellogenins of higher diptera have a different 
ancestry from all other known vitellogenins, such as 
those of the chicken, frog, locust, and nematode. The 
latter are generally longer proteins encoded by genes 
which are interrupted by many introns and share 
sequence similarities which indicate a common evolu- 
tionary origin. No sequence similarity can be detected 
between the dipteran and the other vitellogenins. It 
appears, therefore, that during the evolution of higher 
diptera the functions of the major yolk proteins were 
taken over by a gene which has common origin with 
the present day triacylglycerol lipases. It is intriguing 
that another protein of diptera, the enzyme alcohol 
dehydrogenase (ADH), has an analogous evolutionary 
history. The £>. meianogoster enzyme is related to 
prokaryotic ribitol dehydrogenases and shows no se- 
quence or intron/exon structure similarity to any of 
the sequenced eukaryotic ADHs. In contrast, the 
ADH proteins from yeast, plants and mammals are all 
related to each other, having presumably evolved 
from a common ancestral gene (Jornvall, Persson 
and Jeffrey 1981; Jornvall et al 1984; Suluvan, 
Atkinson and Starmer 1990). With the increasing 
accumulation of new protein sequences it will be in- 
teresting to see whether such events have occurred 
also during the evolution of other taxa. 

C. capitata and D. meianogoster genes have differ- 
ent codon usage patterns: Synonymous codons are 
not used with equal frequency and, often, genes from 
one species share similarities in codon usage (Gran- 
tham** ol 1980, 1981). Moreover, in species showing 
non-random codon usage, individual genes differ 
from each other in the degree, rather than in the 
direction of the bias (Sharp et al 1988). Nonrandom 
usage of alternative codons can be generated by biases 
in mutation patterns and by selection operating at the 
level of translation. Generally, selection for more ef- 
ficient translation is probably driving the codon usage 
patterns in several genes of prokaryotes and yeast; in 
these species abundantly expressed genes show strong 
codon usage biases while weakly expressed genes have 
a more even synonymous codon representation (Gouy 
and Gautier 1982; Ikemura 1985; Sharp and Li 
1986). In mammals codon usage varies among genes 
mainly in (G + C) content, and specifically in the 
frequency of the dinucleotide CpG, which correlates 
with base composition around the gene and in introns 
(Aorta and Ikemura 1988). Z>. meianogoster genes 
also show considerable codon usage bias (Bodmer and 



TABLE 1 



Base utilization at position III of codons for vitellogenin and 
chorion genes of D. meianogoster and C. capitata 



Species* 


A 


T 


G 


C 


Total 


%(G+C) 


Drosophila 














YP1 


17 


70 


146 


207 


440 


80.22 


YP2 


19 


68 


151 


205 


443 


80.36 


YP3 


15 


65 


139 


202 


421 


80.99 


s36 


25 


52 


87 


123 


287 


73.17 


s38 


34 


75 


64 


134 


307 


64.49 


Ceratitis 














VG1 


90 


U3 


83 


151 


437 


53.31 


VG2 


82 


137 


81 


137 


421 


51.78 


$36 


74 


116 


42 


87 


321 


40.18 


$38 


60 


106 


38 


78 


282 


41.13 



• Sources of sequences: Drosophila YP1, YP2 and YP3 have EMBL 
nucleic acid database accession numbers V00248, JO 1157, and 
Ml 5898, respectively. Drosophila $36 and $38 chorion DNA se- 
quences are from EMBL entry XI 2635; Ceratitis s36 and s38 
sequences are EMBL entries X51342 and X55886, respectively. 



Ashburner 1984; Ashburner, Bodmer and Lemeu- 
nier 1984). On average, there is a strong deficiency 
of A and a weaker deficiency of T in the third position, 
and a more marked under-representation of NTA 
and NAA codons. Among different D. meianogoster 
genes there is a correlation between degree of syn- 
onymous codon nonrandomness and levels of expres- 
sion, suggesting that translationaJ selection may be 
operating in Z). meianogoster* as in prokaryotes and 
yeast (Shields et al. 1988). 

We compared codon usage in D. meianogoster and 
in C. capitata for the vitellogenin genes and for two 
chorion protein genes, s36 and s38 (Konsolaki et al 
1990; Touas et al 1990). A summary of the results 
is shown in Table 1 * The D. meianogoster genes exhibit 
the deficiency of A and T in the third position which 
is typical for most abundantly expressed genes of this 
species; the bias is stronger in the vitellogenins (80% 
to81%G + Cinthe third position) than in the chorion 
proteins (73% and 64% for Ccs36 and Ccs38, respec- 
tively). In contrast, the vitellogenin genes of medfly 
show a rather even synonymous codon usage (approx- 
imately 50% G + C in the third position), while the 
medfly chorion genes show a slightly reversed bias 
(approx. 40% G + C). In all four medfly genes there 
is also a small bias against GTA, ATA and NCG 
codons, which is not observed in D. meianogoster (re- 
sults not shown). 

There are two alternative explanations for the ob- 
served difference between the two species. First, it is 
possible that selective pressure for specific synony- 
mous codons is less strong in the vitellogenin and 
chorion genes of the medfly, because these genes are 
not as abundantly expressed as in D. meianogoster. All 
four genes code for proteins required at high levels 
during oogenesis, and C. capitata, with a life cycle 
twice as long as D. meianogoster and a comparable 
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number of egg output may have lower rates of expres- 
sion of these genes than £>. melanogaster. Alternatively, 
selection of synonymous codons may not be operating 
at all in the medfly if its effective population size is 
small. As has been pointed out (Shields et al 1988), 
the selection coefficients for synonymous codons are 
expected to be very low, and selection is possible as 
long as > 1 , where N, is the effective population 
size and s is the difference in selection coefficients 
between synonymous codons. Sequence data from 
medfly genes encoding weakly and highly expressed 
proteins may resolve this question. 

Differences in codon usage patterns have been ob- 
served even among species of the genus Drosophila. 
In members of the Sophophora subgenus, the gene 
encoding alcohol dehydrogenase exhibits a more 
biased codon usage (similar to that of D. melanogaster) 
than in members of the Drosophila subgenus (Sulli- 
van, Atkinson and Starmer 1990). Taken together 
with the medfly data, these observations should serve 
as caution whenever phylogenetic distances are in- 
ferred from synonymous substitution rates. To mini- 
mize influences by differences in codon bias, we pro- 
pose that weakly expressed genes with demonstrated 
low codon usage bias are used in such studies. 

We thank F. C. Kafatos for a critical reading of the manuscript. 
This work was supported by a U.S. Department of Agriculture 
grant. 
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Analysis of-a Vitellogenin Gene of the 
Mosquito, Aedes aegypti and Comparisons to 
Vitellogenins from Other Organisms 
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A genomic clone of the Aedes aegypti vitellogenin Al gene was sequenced^ including 2015 bp of 5' 
untranscribed sequence, 6369 bp of open reading frame interrupted by two introns, and a short 3' 
untranslated region. Primer extension was used to identify the transcription initiation site. The amino 
termini of the large and small subunits were located by N-terminal sequencing of vitellin purified from 
eggs. The length of the signal sequence and the position of the cleavage site between the two subunits 
were also determined. Three sequential imperfect repeats were found near the beginning of the small 
subunit The sequence of the coding region appears to be polymorphic. Comparison of the signal 
sequences of seven insect vitellogenin genes revealed several conserved leucines, and a conserved 
position of an intron. However, the signal sequences are not conserved between these genes andJfa&yolk 
protein genes of Cyclorraphid Dipteran insects. The cleavage sites between the small and large subunits 
in4he vitellogenins of the mosquito, A, aegypti, sawfly, Athaha rosae, boll weevil, Anthonomus grandis, 
and silkworm, Bombyx mori are flanked by sequences rich in serine. Pairwise dot matrix analysis at 
the protein level showed that the mosquito, boll weevil and silkworm vitellogenins are significantly 
related with approx. 50% similarity. One region of the three insect vitellogenin genes, near the 
N-terminal of the large subunit, showed the highest levels of similarity, from 573 to 64.4%. The 
position of cysteines in insect vitellogenins is conserved, particularly in the C-terminus of the large 
subunit Dot matrix comparison of the mosquito vitellogenin with that of Xenopus laevis and 
Caenorhabditis elegans showed much lower, but still significant degrees of relationship. Pairwise 
comparisons of the mosquito vitellogenin and the Drosophila melanogaster yolk proteins did not show 
significant similarities. Potential regulatory regions in the mosquito VgAl gene were identified by 
comparison to regulatory elements known from other organisms, especially £>. melanogaster, which 
could provide useful information for further functional analysis. 

Aedes aegypti Hormone response element Sequence comparison Vitellogenin gene Ecdysone 



INTRODUCTION 

The major egg proteins have been the subject of intense 
study for many years. Before the technological revolu- 
tions that allowed the analysis of macromolecules from 
small samples, the massive amounts of vitellogenin and 
ovalbumin produced during egg development, and their 
concentration in the egg, made these proteins the mol- 
ecules of choice for many kinds of biochemical investi- 
gations. A considerable amount is known about the 
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biochemistry and physiology of these proteins which 
forms the basis for the current intensive investigations of 
their molecular biology. 

Evidence for the distant evolutionary relatedness of 
vitellogenin genes of vertebrates and nematodes exists in 
their primary structure, and in the positions of introns 
(Nardelli et al t 1987; Spieth et al., 1991; Trewitt et al., 
1992). However, comparatively little is known about the 
relationships among insect vitellogenin genes, and the 
relationships between insect vitellogenin genes and those 
of other organisms. Genomic sequences are now avail- 
able for the vitellogenin genes of the boll weevil, An- 
thonomus grandis (Trewitt et al., 1992), the silkworm, 
Bombyx mori (Yano et al. 9 1994b), the mosquito, 
Anopheles gambiae (P. Romans, unpublished), and the 
yolk proteins of the fruit flies, Drosophila melanogaster 
(Hung and Wensink, 1983; Garabedian et aL, 1987) 
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and Ceratitis capitata (Rina and Savakis, 1991), and 
the blowfly, Cailiphora erythrocephala (Martinez and 
Bownes, 1994). cDNA sequences have been obtained for 
the mosquito, Aedes aegypti, vitellogenin gene (Chen 
et al y 1994) and the silkworm (Yano et a/., 1994a). 
Partial sequences of vitellogenin genes for the sawfly, 
Athalia rosae (Kageyama et aL % 1994); locust, Locusta 
migratoria (Locke et aL 1987); and Gypsy moth, 
Lymantria dispar (Hiremath et aL 1994), are also 
available. 

Among the Diptera there are major differences in the 
molecules used as vitellogenins. In the lower Diptera, 
which include the mosquitoes, vitellogenins resemble the 
molecules seen in other insects and vertebrates where 
they are composed of large glycolipoproteins with large 
(>!20k) and small ( — 55 k) subunits (Kunkel and 
Nordin, 1985). In the Cyclorraphid Diptera, such as the 
fruit fly, D. melanogaster, the vitellogenins (appropri- 
ately called yolk proteins) are quite different from other 
animal vitellogenins, being composed of several small 
(~45k) molecules related to mammalian tricylglycerol 
lipase (Baker, 1988; Bownes et al, 1988; Terpstra and 
AB, 1988). A comparative study of the vitellogenin and 
yolk protein genes could provide insight into the evol- 
ution of these molecules. 

A major focus of current work on vertebrate and 
invertebrate vitellogenins concerns the hormonal control 
of gene expression. This work provides some of the most 
detailed models for endocrine regulation of gene ex- 
pression. Estrogen regulates vitellogenin gene expression 
in the vertebrates (Corthesy et al., 1990). Among the 
insects, juvenile hormone appears to be the most import- 
ant regulatory hormone (Koeppe et aL 1985), with the 
exception of the Diptera where 20-hydroxyecdysone 
substitutes for juvenile hormone in some species (Hage- 
dorn, 1985, 1994). 

In the mosquito, A. aegypti, vitellogenin is synthesized 
in the fat body after a blood meal under the control of 
20-hydroxyecdysone, however, juvenile hormone also 
has effects on vitellogenin synthesis prior to the blood 
meal in ways that are not well understood (Borovsky 
et al, 1985; Racioppi et al, 1986; Martinez and Hage- 
dorn, 1987; Hagedorn, 1994). One of our goals is to use 
molecular techniques to study the hormonal control of 
vitellogenin gene expression. Thus, a comparison of the 
regulatory regions of mosquito genes with other genes 
controlled by steroid hormones could identify potential 
hormone response elements for future functional analy- 
sis. 

This paper presents the genomic sequence analysis 
of an expressed vitellogenin gene of the mosquito, A. 
aegypti. 

MATERIALS AND METHODS 

Animals 

A. aegypti derived from the NIH-Rockefeller strain 
were reared as described by Shapiro and Hagedorn 



(1982). Three- to four-day-old females were fed 
on warmed pig blood (37 C) through a Parafilm mem- 
brane. 

Purification of vitellin from eggs 

Ovaries containing vitellogenic oocytes were dissected 
from females fed 24 h earlier and immediately frozen. 
Ovaries were homogenized at 0"C in a 0.5 M Tris-P0 4 
buffer, pH 8.0 containing 0.4 M NaCI and 0.1 mM DFP 
as a protease inhibitor, and the homogenate was cen- 
trifuged for 5min at maximum speed in a Beckman 
(Palo Alto, Calif.) microcentrifuge at 4"C. The super- 
natant was dialyzed against a 0.05 M Tris-HCl buffer, 
pH 8.0 containing 0.05 M NaCI and bound to a DEAE 
(Whatman-52) column. Vitellin was eluted with a 
0.05-O.5 M gradient of NaCI at 0.25 M. 

N -terminal amino acid sequence 

Vitellin protein prepared as described above was 
separated by PAGE and blotted onto PVDF membrane. 
Regions containing the large and small subunits were cut 
from the membrane and the N-terminal amino acid 
sequence was determined by automated Edman degra- 
dation at the University of Arizona Biotechnology Core 
Facility using an Applied Biosystems 477A Protein 
sequencer interfaced with a 120A HPLC (C-18 PTH, 
reverse-phase chromatography) analyzer to determine 
phenylthiohydantoin (PTH) amino acids. 

Nucleotide sequencing 

DNA sequencing was performed on EcoR I, Hind III, 
EcoR I-Hind III and Hind Ill-Sal I subclones of the 
VgAl lambda clone in pBluescript II KS" (Stratagene). 
Sequencing primers were T3, T7 (Promega) and several 
synthetic oligonucleotides based on previously obtained 
sequence prepared at the University of Arizona Division 
of Biotechnology, or University of Toronto Zoology 
Molecular Core Facility. Double stranded sequencing 
template DNA was purified over Qiagen columns. 
Sequence was obtained manually using a ,5 S-dATP. 
> 1000 Ci/m mol (Amersham), a Sequenase version 2.0 
kit (United States Biochemical Corp., and Amersham) 
and standard 6% acrylamide urea gels, or by an auto- 
matic DNA sequencer (model 373A, Applied Biosystems 
Int., Forster City, Calif.) at either the Core Facility for 
Protein/DNA Chemistry, Queen's University, or the 
Division of Biotechnology, University of Arizona. 

Polymerase chain reaction (PCR) 

PCR reactions were carried out to double-check the 
sequences of two regions in the B and C fragments 
(Fig. I), using either the genomic DNA (0.45 /ig tem- 
plate per reaction) of A. aegypti or the cloned plasmids 
(0.06-0.6 /ig template per reaction) as template. 
PCR reactions were carried out in a total volume of 
100 pi with 10^1 of 10 x buffer (200 mM Tris-HCl, 
pH 8.4, 500 mM KC1) 1 .6 mM of Mg 2+ , 0. 1 mM dNTPs, 
0.5 /jM of each primer, and 2.5-5.0 units of Taq poly- 
merase. Templates were denatured at 94'C for 1 min. 
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Ba=BamHI, Bg=BglH, C=Clal, E=EcoRI, H=Hindill, P=Pstl, S=Sall, X=Xnol 

FIGURE I. Restriction enzyme map of the genomic clone in phage lambda containing the VgAI gene of A. aegypti. The clone 
is divided into fragments (A to G) defined by EcoRl sites as described by Gemmill et al. (1986). Additional EcoR I fragments 
were identified during remapping and sequencing. The location of the transcribed region (mRNA) is indicated below the 

restriction map. Also shown is the region sequenced. 



Extension was carried out at 72°C and the extension time 
was calculated as 1 kb/min. Annealing temperature was 
54°C PGR products were sequenced by the Division of 
Biotechnology of the University of Arizona using syn- 
thetic primers and the automatic sequencer. 

RNA preparation 

Total RNA was isolated by modification of the single- 
step RNA isolation method (Chomczynski and Sacchi, 
1987). Adult female A. aegypti were frozen at -80°C 
24-30 h after blood meal, quickly shaken in a cold 
Erlenmeyer flask and sieved to remove the heads which 
contained a pigment that interfered with some of the 
subsequent steps. The thoraces and abdomens were then 
ground in a mortar and pestle on dry ice. Five ml of 
denaturing solution containing 4 M guanidinium thio- 
cyanate was added and after mixing thoroughly, 0.5 ml 
samples were placed into 1.5 ml polypropylene tubes. 
Water-saturated phenol-chloroform extraction and 
ethanol precipitation were carried out essentially as 
described by Chomczynski and Sacchi (1987). RNA was 
purified by (1) re-extracting with phenol-chloroform 
(pH 7.2-7.4) after dissolving the RNA pellet in DEPC- 
treated 0.5% SDS, or (2) using an RNaid plus kit (BIO 
101, Vista, Calif.) following the manufacturer's instruc- 
tions. 

Primer extension analysis 

A 21-mer synthetic oligonucleotide complementary 
to a region immediately upstream of the translation 
initiation site (Fig. 2) was synthesized as above, and 
purified by polyacrylamide gel electrophoresis. 100 ng of 
primer was end-labeled using 10 units of T4 polynucleo- 
tide kinase in a \0p\ reaction containing 3/zl (y- 32 P) 
ATP (3000 Ci/mmol, lOmCi/ml) in 50 mM Tris-HCl 
buffer (pH7.5) plus 1 mM MgCl 2 , 5mM DTT and 
0.1 mM spermidine. After a lOmin incubation at 37°C 
the sample was heated to 90°C for 2 min to inactivate the 
kinase. The final concentration was adjusted to 
100fmol//il and stored at -20°C. 

In the reaction, lOOfmol of 32 P-Iabeled primer and 
10/ig of total RNA were mixed in JOOmM Tris-HCl 
(pH8.3), 50 mM KC1, 10 mM MgCU, lOmM DTT, 
1 mM of each dNTP and 1 mM spermidine in a volume 



of 10 pi. The primer was annealed to the RNA by 
heating the tube to 58°C for 20 min, and allowing 
the sample to cool at room temperature for 10 min. 
One unit of AMV reverse transcriptase was added in 
2.8 mM Na 4 P 2 0 7 (total volume, 20 p\). The reaction was 
incubated at 42°C for 30 min. An equal volume of 
loading buffer (98% formamide, 10 mM EDTA, 0.1% 
Xylene cyanol, 0.1% bromophenol blue) was added 
and heated at 90°C for 10 min immediately before 
loading onto an 8% polyacrylamide gel containing 7 M 
urea. The gel was dried and submitted to autoradio- 
graphy. 

Ribonuclease protection assay 

DNA of the subloned B fragment of VgAI (Fig. 1) 
was digested with Cla I and treated with proteinase K 
(lOOmg/ml) for Ih at 37°C followed by a phenol- 
chloroform extraction and precipitation with ethanol. 
The DNA was dissolved in DEPC-treated water at a 
concentration of 1 pgjp\. A 634 base antisense RNA 
probe, labeled with (a- 32 P)-UTP, was transcribed from 
the T3 promotor using an RNA transcription kit (Strata- 
gene, La Jolla, Calif.). The DNA template was then 
digested by adding RNase-free DNase I (Boehringer 
Mannheim, Indianapolis, Ind.) at 37 D C for 15 min. 
Unincorporated nucleotides were removed by passing 
the reaction over a Quick-spin G-50 column (Boehringer 
Mannheim). The el u ted RNA was precipitated with 
ethanol, and dissolved in 100 p\ of hybridization buffer 
(40 mM PIPES, pH 6.4, 1 mM EDTA, 0.4 M NaCl, 80% 
formamide). This RNA probe was mixed with 10 jig 
total RNA from blood- fed female A. aegypti, denatured 
at 85°C for 10 min, and transferred to a water bath at 
50°C overnight. The hybridization mixture was incu- 
bated with RNase digestion mixture (300 mM NaCl, 
10 mM Tris-CI, pH 7.4, 5 mM EDTA, 40 j/g/ml RNase 
Al, 2/ig/ml RNase Tl) at 30°C for I h. 20 p\ of 10% 
SDS and IOjjI of a lOmg/ml solution of proteinase K 
were added and incubated at 37°C for 30 min. The RNA 
was extracted with phenol-chloroform, precipitated with 
ethanol, resuspended in formamide loading buffer, 
heated at 95°C for 5 min and then analyzed on a 6% 
polyacrylamide gel containing 7M urea. The gel was 
dried and examined by autoradiography. 
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gaattccaccaccaggcagtgctagtgtgcatgaactgaaagatggcgtcttcggtaaag -1956 
ttgttaacatgttcactagtgccacctggtggcaaaatcttgaatttcaacaatgatagc -1896 
ctttcacttattgaacaagtttgtcgaagacgccatctttataagtagtcaagatttgga -1836 
gattacggcaaaccaaaagattaatattcactagcgccatctgatggctaaatttcgaat -1776 
ttatcggctagattaatgatagatctaaagctgctgaaaaactttgccgaatacgtcatc -1716 
tttctgagagaccaggatactgagttatttttaaaacaaaggttccgtgctcactaatgc -1656 
cgcatggtggctaaatggcttttatactgaaaacgcttgagctttgcctaacacgtcgtt -1596 
tttttaggtagtgaggaacctgatataacagaataaaatcgatacaccagcgccgcctgg -1536 
attccgcatatctattaccaccgcatgttagccctttatccaaataaaaaaaattactga -1476 
acatcatgactctctatgttgatgacttttcaataagcatttataataacggtcgtcaat -1416 
ctcaatttcattcgtatatataaccatcgtagtacgaatggtcataaaaagaggttgaag -1356 
tgtgtttggattctccccagacaaaaaatcgaaatgaaaaactgaaacccattttgacaa -1296 
tcgtcggaaaggtctttctgtttagttcactgtaacaaaatgcaatccaaagatatgaaa -1236 
gctttgaaaacggcgaaaagttattatatctaccgttttattacggaaagcttcttattc -1176 
cggacactctactttgtatgagaaacatttcatatgagatgtttcaatttttgctgttca -1116 
aaagttcacacttttgaggttcattttaatcaaaattttctatagatatctatgaaaatt -1056 
tataatactcaactgctttagacgcctctttagaggcttgccaatttcaaataatgattt -996 
gagcttgtcttttttatgattttcatgagctgtccggaatttgaatcaaagtgtccggaa -936 
t atggggcaaaagtaaggaagcgtccggaa taagaa t cgt gaaaag tccacaca 1 1 tc ta - 8 7 6 
tttattgaaaatcattcatgtatcagaaacaaaattttcacccaccattcgaaagttaag -816 
tgttttgaaggcttagagccgaggaatcatacagattgatttattttatatgcttcctga -756 
tgggttatacttcattgaggccttaagtgtccgtaatatgaatcaaaacggtacatatgt -696 
catggcaacattttccgaaacctggagataaattacagagcccactcagtatcgcagatg -636 
ccaacagtgtggctatggtaccaaacatcgtgcaaaagtttgtcattttactcaataact -576 
gacttgtataattagtttaagcttattcaaatgctttttacaagctgatgatttcaattt -516 
gcctgtgaaaattctgaactgttgcggcaaatgaaatgcaatcgatagaaacaaataatg -456 
tagcaatagcaaaaaataatcattttttgcttatcttactatcttcaattcacatctgta -396 
gtctcaattgaataatctggaatccattgcaagctaagtaaattcacgtgtgacctagcg -336 
ggaggccaatggtcgagtgaatctttatttcttgaatggcagaaacgatgccatgaatca -276 

Fig. 2— legend on p. 948. 
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aatccaggatgcgaaaccgatgcacaagaataccaatacgaaaactaatcccaatacaac -216 

gatcaccaggtgcatcgatgcgaccgcgattactttgttaccatctgttgtgctgcggtt -156 

atcgctttttcgattagaaggcgaacgctgaaccgatcgtatgcttatcatcgcgaacga -96 

aaggatgctgtgaatcactgctgatgggggcaaaaatctacgaaaatgtaagcaatcact -36 

ttcaaatataaacccttccaatggccacagacggtatca^ttctcgttttggtttcaacg 25 

TATA box +1 
agaggaggagaacacacaatcggaacagctgccggatacttgaagacaagMfiCTAGCGA 85 
1 M L A K 

AACTACTTCTTCTCGCTTTGGgtaagtgctcccgggaatgttcgcctcaaacttcgaata 145 
5 LLLLALA 

ctattctgttctttcctttcgacttccacagCGGGGCTCACTGCTGCCTACCAATACGAG 205 
12 GLTAAYQYE 

AACTCGTTCAAGGGCTACAATCCTGGCTATAAGGGC TACGATGCTGGCTACAAGGGTTAC 265 
21NSFKGYNPGYKGYDAGYKGY 

=====:====:=:==:===:=:=:==============: 

A*************************** ***** 

GGCTACGATGCTGGCTACAAAGGCTACGGATACGATGCTGGTTACAAAT^ 325 

41GYDAGYKGYGYDAGYKYNNQ 
******************* ************************** 

GGCTACAGTTACAAGAACGGTTTCGAATATGGA^ 385 
61GYSYKNGF EYGYQNAYQAAF 

TATAAGCACCGTCCAAACGTAACCGAATTCGAGTTC^ 445 
81YKHRPNVTEFEFSSWMPNYE 

TACGTCTACAATGTGACGTCCAAGACCATGACCGCTCTGGCGGAATTGGACGATCAGTTO 505 
101 YVYNVTSKTMTALAEL DDQW 

ACTGGTGTTTTCACCCGTGCCTACCTGGTCATCCGTC 565 
121 TGVFTRAY LVI RPK S RD YVV 

GCTTACGTCAAGCAGCCAGAATACGCTGTCTTCAACGAACGCCTGCCACACGGATATGCT 625 
141 AYVKQPEYAVFNERLPHGYA 

ACCAAGTTCTATCACGATATGTTCAAGTTCCAACCAATGCCAATGAGCAG 685 
161 TKFYHDMFKFQPMPMSSKPF 

GGAATCCGTTACCATAAGGGCGCCATCAAGGGTCTGTAC^ 745 
181 G I R Y H K G A I KG LYVEKT I PN 

AATGAAGTCAACATCCTGAAGGCTTGGATCAGTCAGCTGCAGGTTC 805 
201 NEVNI LKAWI SQLQVDTRGA 

AACTTGATGCATTCCAGCAAGCCCATCCATCCTTCCAAGAATG^ 865 
221 NLMHSSKPIHPSKNEWNGHY 

AAGGTTATGGAGCCATTGGTTACCGGAGAATGTGAAACCCATTACGATC 925 
241 KVMEPLVTGECETHYDVNLI 

CCAGCCTACATGATCCAAGCTCACAAACAGTGGGTTCCTCAATO 985 
261 PAYMIQAHKQWVPQGQLRGE 



Fig. 2 (continued) — legend on p. 948. 
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GATGATCAGTTCATTCAAGTCACC AAGACTCAGAACTTTGACCGTTGCGATCAACGCATG 1045 
281 DDQF IQVTKTQNFDRCDQRM 

GGTTACCACTTTGGATTCACCGGATACAGCGATTTCAGACCCAACACCAACCAAATGGGA 1105 
301 GYHFGFTGYSDFRPNTNQMG 

AATGTTGCCTCCAAGTCTTTGGTTTCATACATGTATTTGACTGGAAACTGGTACAACTTC 1165 
321 NVASKSLVSYMYLTGNWYNF 

ACCATCCAATCGTCCAGCATGATCAACAAGGTGGCTATCGCTCCTTCCCTAGTGAACAAG 1225 
341 TIQSSSMINKVAIAPSLVNK 

GAACCAGCTCTCGTGTACGCTCAGGTTAACATGACCCTCAACGATGTCCACCCTTACGAT 1285 
361 EPALVYAQVNMTLNDVHPYD 

AAGGTCCCAATGGGCCCAGCCGAAGATCTGAAGGTGTTCGTCGATTTGGTTTACAGCTAC 1345 
381 KVPMGPAEDLKVFVDLVYSY 

AACATGCCAAGTGATAAGAAGAACTACGTTCGTCCTGGCAACGAAACCTCCTCCTCCTCA 1405 
401 NMPSDKKNYVRPGNETSSSS 

TCGTCATCTTCTTCATCGTCCTCCTCCTCATCGGAATCTAGCTCGTCCAGCTCTGAATCT 1465 
421 SSSSSSSSSSSESSSSSSES 

GTGGAAAACCCCAAGATCAGCCCCGTTGAGCAGTACAAGCCTCTGCTGGACAAAGTTGAG 1525 
441 VENPK I SPVEQ.YKPLLDKVE 

AAGCGTGGAAACCGCTACCGTCGTGATCTGAATGCCATCAAGGAAAAGAAGTACTACGAA 1585 
461 KRGNRYRRDLNAIKEKKYYE 

GCTTACAAAATGGATC AGTACCGTCTGCACCGTTTGAACGATACTTC ATCCGACTCCAGC 1645 
481 AYKMDQ YRLHRLNDTSSDSS 



AGCTCTGATTCCTCATCATCC AGCTCGTCGGAATCCAAGGAACACCGCAACGGCACTTCT 1705 
501 SSDSSSSSSSESKEHRNGTS 

TCfcTATTCCAGCTCCTC^TCGTCCTCT 1765 
521 SYSSSSSSSSSSSSSESSSY 

TC ATCCTCTTCTTCCTCTTCCTCGGAGTCCTACTCCATTAGCAGCGAAGAGTACTACTAC 1825 
541 SSSSSSSSESYSISSEEYYY 

CAACCAACACCAGCTAACTTCAGCTATGCTCCCGAAGCTCCGTTCCTGCCATTCTTCACC 1885 
561 QPTPANFSYAPEAPFLPFFT 

GGATATAAGGGATACAAC ATCTTCTACGCACGCAATGTTGATGCCATTCGCTCAGTCGGC 1945 
581 G YKGYNIFYARNVDAIRSVG 

AAACOTGTTGAGGAAATCGCCAGCGATCTGGAAAACCC ATCCAACTTGCCCAAAGCCAAC 2005 
601 KLVEEIASDLENPSNLPKAN 

ACCATGAGCAAGTTCAACATTCTGACCCGTGCTATCAGAGCCATGGGATACGAAGACATT 2065 
621 TMSKFNILTRAIRAMGYEDI 

TACGAGCTGGCCC AGAAATACTTTGTTTCGCAGAAAGAACGTCAAGTCGCTC AGTTTTCC 2125 
641 YELAQKYFVSQKERQVAQFS 

GACAAAAAATTCAGCAAGCGCGTTGACGCTTGGGTTACCCTCCGTGATGCTGTTGCTGAA 2185 

Fig. 2 (continued)— legend on p. 948. 
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661 DKKFSKRVDAWVTLRDAVAE 

GCCGGAACCCCATCCGCTTTCAAATTGAT^ 2245 
681 AGTPSAFKLIFDFI KEKKLR 

GGATACGAGGCTGCCACCGTGATTGCTTCCTTGGCCCAATC 2305 
701 GYEAATVIASLAQS IRYPTE 

(^TCTGCTGCACGAATTCTTCCTCCTGGTTACCAGCGATC 2365 
721 HLLHEFFLLV TSDVVLHQEY 

TTGAATGCCACCGCTCTGTTCGCTTACTCCAACTTCGTC 2425 
741 LNATALFAYSNFVNQAHVSN 

CGTTCAGCTTACAACTACTATCCAGTATTC^ 2485 
761 RSAYNYYPVFSFGRLADADY 

AAGATCATCXSAACACAAGATCGTCCCATGGTTCGCTCACCAG^ 2545 
781 K I IEHKIVPWFAHQLREAVN 

GAAGGAGACAGTGTAAAGATCCAGGTCTACATCCGTTCCCTCGGAAACCTTGGACATCCA 2605 
801 EGDSVKIQVYIRSLGNLGHP 

CAAATCCTGTCGGTATTCGAGCCATACCTGGAGGGTACCATTCAGATCACTGAC 2665 
821 QILSVFEPYLEGTIQITDFQ 

CGCTTGGCCATTATGGTCGCTTTGGACAATC 2725 
841 RLAIMVALDNLVIYYPSLAR 

TCGGTGCTTTACCGTGCCTACCAAAACACTGCCG 2785 
861 SVLYRAYQNTADVH EVRCAA 

GTTCATTTGTTGATGCGCACCGACCCACCAGCTGATATGCTGC 2845 
881 VHLLMRTDPPADMLQRMAEF 

ACTCACCACGACCCAAGGCTCTACGTCCGCGCTGCCGTCAAATCCGCCATTGAAA 2905 
901 THHDPRLYVRAAVKSAIETA 

GCCTTGGC TGACGACTACGACGAAGAC AGCAAGTTGGCCC TTAATGCTAAGGCTGCCATT 2965 
921 ALADDYDEDSKLALNAKAAI 

AACTTCCTGAACCCAGAAGACGTCAGCATTC^ 3025 
941 NFLNPEDVSIQYSFNHIRDY 

GCTTTGGAAAACCTCGAGCTTTCCTACCGTCTGCACTACGGAGAAATCGCTTC 3085 
961ALENLELSYRLHYGEIASND 

CATCGCTACCCAAGTGGACTGTTCTATCATCTGCGCC^ 3145 
981 HRYPSGLFYHLRQNFGGFKK 

TACACCTCGTTCTACTATCTGGTTTCGAGCATGGAAGCTTTCTTCGA 3205 
1001 YTSFYYLVSSMEAFFDIFKK 

CAATACAACACCAAGTACTTCGCTGATTATTACAAATCTGCCGACTACAGCA 3265 
1021 QYNTKYFADYYKSADYSTNY 

TACAACTTTGACAAATACTCCAAGTACTACAAGCAGTACTACTACAGCAAGGACAGCGAA 3325 
1041 YNFDKYSK YYKQYYYSKDSE 

Fig. 2 (continued) — legend on p. 948. 
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TACTACCAGAAGTTCTACGGACAGAAGAAGGATTACTATAACGATAAGGAGCCATTCAAG 3385 
1061 YYQKFYGQKKDYYNDKEPFK 

TTCACTGCACCACGCATTGCCAAGCTGCTGAACATC 3445 
1081 FTAPRIAKLLNIDAEEAEQL 

GAAGGACAACTGTTGTTCAAACTGTTCAACGGATACTTCTTCACCGCT^ 3505 
1101 EGQLLFKLFNGYFFTAFDNQ 

ACCATCGAAAACCTCCCACATAAGATGAGACA^ 3565 
1121 TIENLPHKMRHLFENLEDGY 

GCTTTTGACGTTACCAAATTCTACCAACAA 3625 
1141 AFDVTKFYQQQDVVLAWPLA 

ACTGGTTTCCCATTCATCTACACCCTGAAGGCC^ 3685 
1161 TGFPFIYTLKAPTVFKFEVD 

GCTTCTGCCAAGACCCACCCGCAAGTGTACAAGATGCCAGCTGGTCA^ 3745 
1181 ASAKTHPQVYKMPAGHPETE 

AACGACGATTTCTTCTATATGCCACAGTCTATTAACGGATC 3805 
1201 NDDFFYMPQSINGSVDVNLL 

TACCACCGCATGGTTGATGCCAAGGTTGGATTTGTCACTC 3865 
1221 YHRMVDAKVGFVTPFDHQRY 

ATCGCTGGTTACCAGAAGAAGCTGCACGGTTATT^ 3925 
1241 IAGYQKKLHGYLPFNVELGL 

GACTTTGTCAAGGATGAGTATGAGTTCGAAT^ 3985 
1261 DFVKDEYEFEFKFLEPKDDH 

CTGCTGTTCCACATGAGCTCGTGGCCATACACTGGATACAAGGACAT^ 4045 
1281 LLFHMSSWPYTGYKDITDMR 

CCGATTGCCGAAAATCCAAATGCCAAGATTGTGCACGATGACAACCAATCTACCAA 4105 
1301 PIAENPNAKIVHDDNQSTKT 

ATGGAACACACTTTCGGTCAGGATATGACCGGTGTTGCTCTGCGATTCCA^ 4165 
1321 MEHTFGQDMTGVALRFHAKY 

GACTTTGATCTGATCAACTTCCAAGAGTTC 4225 
1341 DFDLINFQQFWSLIQKNDFV 

TCGGCTGTGAACTATCCATTCGCTTACCAGCCATATGAATACCATC 4285 
1361 SAVNYPFAYQPYEYHQFNLF 

TACGATTCCXIAGCGTACTCACGCCAAATCGTTCAAGTTC 4345 
1381 YDSQRTHAKSFKFFAYQKFG 

GCTCCTTCTTTCGAAGAAACTGGACCGAAGCACCCAGCCAACCGTC 4405 
1401 A PSFEETGPKHPANRHSYSG 

AACTATTACGAATCGAACTACGCTCAACCCTTCGTCTACAGCTC 4465 
1421 NYYESNYAQPFVYSPGSQRR 

TATGAACAATTCTTCCGCAATGCTGCTTCTGGAATC 4525 
1441 YEQFFRNAASGIRNSFVR YY 



Fig. 2 (con t in ued ) — legend on p. 948. 
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GACTTCXK^TTCGAATTCTACGCTCC^ 4585 
1461 DFGFEFYAPQYKSEFTFTTA 

TTCGCTGATAGTCCAGTTGACAAGACTTCCCGC^ 4645 
1481 FADSPVDKTSRQLYYFYASP 

ATGTTCCCAAGCCAATCGTACTTCAAGGATATTCCATTCAGTGGAAAGC^ 4705 
1501 MFPSQSYFKDI PFSGKQFQF 

TGCGCCACCGCTACCAGTGAATTCCCAC^ 4765 
1521 CATATSEF PRVPYLKFSDFD 

AAATACTACGGAGATGCTAGCCAGTACTTCGATTTCCTGT^ 4825 
1541 KY YGDASQYFDFLYGBSCQG 

GGAGCTCACATTGKTTGTGAAGGGTAAGCAGAAGCAGACTGGAAAGTGCCGCGAATACCTC 4885 
1561 GAHIAVKGKQKQTGKCREYL 

CGATTCTCGGATGTTGCTAAGGC TTGC AAGGAACAGATGGCC AACGGATACTACCAATTC 4945 
1581 RFSDVAKACKEQMANGYYQF 

GAGGAATGCCAACAGGCTATCGATCAGGCGTACTATTACGACTTCTACGATTACGCCATT 5005 
1601 EECQQAI DQAYYYDFYDYAI 

GAGTACAAGGATGTCGGCTCTGTTGCCAAGAATCTGACCAACAAGTTCTACAACTACTTC 5065 
1621 EYKDVGSVAKNL TNKFYNYF 

CAGTACGCGTTCTACCCGTACTTCGAATCGAACTTCTTCTACCATGGAAAGTCCAACTAC 5125 
1641 QYAFYPYFESNFFYHGKSNY 

ATCAAGGCTGAATTCGAATTCGCTCCTTATGG^ 5185 
1661 IKAEFEFAPYGDYYNASFFG 

CCAAGCTACGCCTTCCAGGTTCAGAACTACCCGGTCTTCAATGACTACTCT 5245 
1681 PSYAFQVQNYPVFNDYSTYF 

CCATACTTCTTCAAGTACACTTTCTTCCC^ 5305 
1701 PYFFKYTFFPRYQPYYMHRL 

CGATCGCACAAGCCCCGCAACCGTCCGTACTACGAGCTTTCCAACTACGAA 5365 
1721 PSHKPRNRPYYELSNYEQFA 

ATCTTCGATCGTAAACCACAGTATCgt aagt t caag t aa 1 1 c t gtaag 1 1 1 1 caa 1 1 1 at 5425 
1741 IFDRKPQYP 

taacgCtaattgcttttttcagCTTC^TGCTC^ 5485 
1750 SCSFSNDNFYTF 

GACAACAAGAAGTACTTCTACGATATGGGAGAATGCTGGCATGCAGTGATC 5545 
1762 DNKKYFYDMGECWHAVMYTV 

AAGCCAGACTACGACTTCTATGCTCAACAATCCCACTTCTACAACTC 5605 
1782 KPDY DFYAQQSHFYNSDFEY 

AAGTACAAGAATGGATTTGAAGAGTACGAAC7VGTTCGCTG 5665 
1802 KYKNGFEEYEQFAALARRGS 

GACAATCAGCTATACTTCAAGTTCTTGTTCGGAGACAACTACATTGA^ 5725 



Fig. 2 (continued) — legend overleaf. 
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1822 DNQLYFKFLFGDNYI EVFPN 

AACGGTGGTGTTCCATTTGTGAAGTAC AACGGACGTCCATACGACATCAGC AAGAGCAAC 5785 
1842 NG GVPFVKYNGRPYDISKSN 

ATTGCCCACTTCGAATACAAGGAAGGCTACCCAAGCTTCCCCTTCTTCTACGCTTTTGCC 5845 
1862 IAHFEYKEGYPSFPFFYAFA 

TACCCCAACAAGGATTTGGAAGTCAGCTTCTTCGGTGGAAAACTGAAGTTCGCTACCGAT 5905 
1882 YPNKD LEVSFFGGKLKFATD 

GGATACCGTGCTCGCTTCTTCTCGGACTACTCGTTCTACAACAACTTTGTCGGTC 5965 
1902 GYRARFFSDYSFYNNFVGLC 

GGAACCAACAACGGAGAATACTTCGATGAATTTGTCACCGCCGATCAGTGCTACATGCGC 6025 
1922 GTNNGEYFDEFVTADQCYMR 

AAACCTGAGTTCTTCGCTGCTTCCTACGCCATCACTGGACAGAACTGTACCGGACCAGCC 6085 
1942 KPEFFAASYAITGQNCTGPA 

AAGGCCTTCAACTATGCCTACCAAC AGAAGGCC AAGCAGGAATGTGTCAAGCGTGAAGTC 6145 
1962 KAFNYAYQQKAKQECVKREV 

TACTATGGAGACATCATCTACAACCAGGAATACTACCACCCCCGCTACCGCTACTACAAC 6205 
1982 YYGDIIYNQEYYHPRYRYYN 

CACAATGTTGAAGAGTCCTCCAGCTCTTCGTCTAGCTCCTCTTCCGATTCTTCGTCCTC^ 6265 
2002 HNVEESSSSSSSSSSDSSSS 

TCATCTTCTTCGGAATTCAGCTCTCTGGGGCGCTCCGGCAGCTCATCGTCTAGCTCGAGC 6325 
2022 SS-SSEFSSLGRSGSSSSSSS 

TCTGAGGAACAGAAGGAATTCC ACCCACATAAGCAGGAAC AC AGCATGAAGGAATGCCCA 6385 
2042 SEEQKEFHPHKQEHSMKECP 

GTTCAGCATCAGCACCAATTCTTCGAGCAAGGTGACCGC 6445 
2062 VQHQHQFFEQGDRICFSLRP 

CTGCCAGTGTGCCACTCCAAGTGCGCTGCTACGGAAAAGATCAGCAAATACTTCGATGTC 6505 
2082 LPVCHSKCAATEKISKYFDV 

CACTGCTTCGAGAAGGATTCCACCCAGGCTAAGAAGTACAAATCCGAGATTGGCCGCGGC 6565 
2102 HCFEKDSTQAKKYKSEIGRG 

TACACTCCGGACTTCAAGAGCTTCGCCCCACACAAGACTTACAAGTTCAACTACCCGAAG 6625 
2122 YTPDFKSFAPHKTYKFNYPK 

AGCTGTGTCTACAAGGCATACSSfigaaacgacatttgcagatcccatttttgtatgacga 6685 
2142 S C V Y K A Y 

accaatgaactaacgaaataaattataaggcaatttttaaatatgtgttgtttgaattcc 6745 
aatttacgaattgagtcgac 6765 



Fig. 2 (continued). 

FIGURE 2. Nucleotide sequence of 8780 bp of the VgAl clone. Underlined regions are, in order, the TATA box, the 
transcription start site, the first ATG, the stop codon, and the poly A addition signal* N-terminal sequences of both the small 
and large subunits of the protein are double underlined. The imperfect triple repeats of 9-10 amino acid residues are asterisked. 
The introns are shown in lower case letters. Nucleotides are numbered at the right, in reference to the transcription start site 

as + 1. Amino acids are numbered at the left. 
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Computer analysis 

DNA sequence data were assembled using PC Gene 
and subsequently analyzed using the MacVector se- 
quence analysis software of International Biotechnolo- 
gies Inc. and the "Sequence Analysis Software Package" 
version 7.1 from the Genetics Computer Group (De- 
vereux et al> 1984). GCG programs used included: 
Pileup for multiple sequence alignment, Gap for pairwise 
sequence comparisons, Blast for database searches, 
Fetch for retrieving sequence data from GenBank, Find- 
pattern for finding selected patterns in a particular 
sequence, Compare and Dotplot for comparison of two 
sequences using dotmatrix. 

RESULTS AND DISCUSSION 

Nucleotide sequence and the structure of the vitellogenin 
gene VgA 1 in Aedes aegypti 

Four of the five vitellogenin (Vg) genes of the 
mosquito, A. aegypti have been cloned and mapped 
(Gemmill et al., 1986; Hamblin et al, 1987). One of 
these, VgAl, has been used to follow vitellogenin mRNA 
levels in physiological experiments (Gemmill et a/., 1986; 
Racioppi et al. 9 1986) and is the subject of this investi- 
gation. Figure 1 shows the revised restriction map of the 
original clone containing this gene. Since Northern blots 
(Gemmill et al y 1986) suggested that the vitellogenin 
coding region began near the 3' end of the EcoRI 
B-fragment, it seemed likely that important regulatory 
elements of the gene would be in this fragment. There- 
fore, 8780 bp of the vitellogenin Al genomic clone was 
sequenced, including the EcoRI B fragment (Fig. 2). An 
open reading frame was found to begin near the 3' end 
of the EcoR I B fragment and extend almost to the 3' 
end of the fragment. 

Primer extension was used to identify the transcription 
initiation site. A 21 base oligonucleotide complementary 
to sequence immediately upstream of the predicted ATG 
(see below) was hydridized to total RNA extracted from 
blood-fed females and extended using AMV reverse 
transcriptase. The results (Fig. 3) indicate a start site 
beginning ATCACTT, 75 nucleotides upstream of the 
ATG. This sequence corresponds well to the consensus 
mRNA cap site derived from D. melanogaster, 
ATCA[G/TJT[C/T] (Hultmark et aU 1986), differing 
only by one nucleotide. Sequence TCACT (+2 to +6) 
is similar to the arthropod initiator consensus TCAGT 
found in the vicinity of the initiation site (Cherbas and 
Cherbas, 1993). A potential TATA box is located 30 
nucleotides upstream from the cap site. This sequence 
differs from the consensus G/(A)TATAAAA (Breath- 
nach and Chambon, 1981) by one base and is appropri- 
ately spaced upstream of the cap site. The ATG at +76 
is the first possible translation start site downstream 
of the cap. This sequence, AAGATGC, contains an 
adenine at position —3 that is most critical for efficient 
translation initiation (Kozak, 1989), although it lacks a 
purine at +4. 



A 70 base intron interrupting the open reading frame 
in the 11th codon was predicted by consensus rules 
(Breathnach and Chambon, 1981). Its presence was 
confirmed by RNase protection (Fig. 4) in which a major 
pair of fragments define the intron end points at pos- 
itions + 107 and +176. The minor pair of fragments 
could be due to expression of another member of the 
gene family, or to a different allele. 

By comparison with the cDNA sequence of A. aegypti 
vitellogenin (Chen et al., 1994), a second, 57 bp, intron 
is predicted between positions +5391 and +5447. 
Both the 5' and 3' splice sites for this intron conform 



T G C A 

123456789 




FIGURE 3. Determination of the transcription start site using primer 
extension. RNA from whole blood-fed females was used as the 
template. A 21 base M P-labeled oligonucleotide immediately upstream 
of the ATG was used as primer. Lanes 1-4 contain a sequencing 
reaction of the region of interest The arrow head indicates the position 
of the TATA box. Lanes 6 and 7 contain the primer extended sample. 
Lane 8 contains a I kb ladder. 
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FIGURE 4. Location of an intron in the signal sequence using RNase 
protection. A H P-labeled riboprobe was synthesized using T3 RNA 
polymerase from the T3 promoter to the first da I site in the B 
fragment. The riboprobe was hybridized to RNA extracted from 
blood-fed females, digested with RNase A and RNase Tt prior to 
separation by PAGE. Lane I, a 1 kb ladder. Lane 2, the probe. Lane 
4, the RNase treated sample. Lanes 5-8 contain a sequencing reaction 
of the region of interest. 



to the consensus rules of Breathnach and Chambon 
(1981). 

A stop codon was found at +6647 followed by a poly 
A addition signal beginning at +6702. Therefore, the 
8780 bp sequenced region of the VgAl genome clone 
contained 201 5 bp of 5' untranscribed upstream se- 
quence, a 6369 bp open reading frame interrupted by 
two introns, and a short 3' untranslated region. 

Protein product of the VgA I gene 

The deduced amino acid sequence is also shown in 
Fig. 2. Duplicated sequences were found, beginning at 



residue 29 which consisted of three sequential imperfect 
repeats: 

»GYKGYDAGYK M 
39 GY GYDAGYK*, 
^GY GYDAGYK* 

The nucleotide sequences of these repeats are slightly less 
similar than the amino acid sequence due to the degen- 
eracy of codons. 

It has been shown that the two subunits of the 
vitellogenin of A. aegypti originate from a common 
precursor (Bose and Raikhel, 1988). To precisely localize 
the beginning of the secreted peptides, we assumed that 
the N-terminus of vitellin would be the same as for 
vitellogenin. Vitellin was purified from mature eggs and 
the two subunits were separated by SDS-PAGE and 
electrophoretically blotted to PVDF membrane. N-ter- 
minal sequencing of protein subunits on the blots 
showed that the N-terminal sequence of the smaller 
subunit was: YQYENAFKGYNPGYK. As shown in 
Fig. 2, this sequence was found to begin at amino acid 
17. There was a serine instead of an alanine, in the amino 
acid sqeuence deduced from DNA corresponding to 
position 6 of the above N-terminal sequence. This 
difference could arise because of different expressed 
alleles of the VgAl gene, or the heterogeneity of vitel- 
logenin due to expression of different members of the 
vitellogenin gene family. This evidence suggests that the 
region from the first methionine to amino acid 16 
contains a signal sequence. This also conforms to the 
prediction of the -3,-1 rule of the signal peptide 
cleavage site (Von Heijne, 1990). The first intron inter- 
rupts the signal sequence in codon 11. 

The N-terminal sequence of the large subunit was 
found to be: DLNAIKEKKYYEAYK. As shown in 
Fig. 2, this sequence is found to begin at amino acid 
residue 469. A cleavage sequence for dibasic processing 
endoproteases (Barr, 1991), RYRR, appears just before 
the beginning of the large subunit. This site was also 
located by Chen et ai (1994) using N-terminal sequences 
of vitellogenin, rather than vitellin, suggesting that the 
N-termini of the vitellogenin subunits are not modified 
after uptake into the egg. 

Variations in the mosquito VgA I gene 

The mosquito vitellogenin cDNA has been sequenced 
by Chen et ai (1994). The cDNA clones isolated by 
Chen et ai. were identified using the EcoRI C fragment 
of the VgAl gene as a probe. The Rockefeller strain of 
A. aegypti was also used by this group (A. Raikhel, 
personal communication). They found six base substi- 
tutions near the 5' end of their 6504 bp cDNA sequence, 
which were suggested to be due to allelic differences in 
the mosquito population. None of these base substi- 
tutions affected the amino acid residues. Their cDNA 
sequence also showed several single nucleotide differ- 
ences from our genomic sequence at nucleotides +68, 
+ 2863, +4326, +4872, +5471, +6005, +6064, 



A. AEGYPTI VITELLOGENIN GENE 



951 



+6292, +6293, +6294, +6295. All 11 of these substi- 
tutions are at different positions from their six substi- 
tutions. These 11 substitutions result in changes in 
six amino add residues. As described above, three 
imperfect repeats of 10 amino acids were found in our 
sequence near the beginning of the small subunit 
from residues 29 to 56 while the cDNA sequence 
reported by Chen et al. (1994) showed only two of 
these imperfect repeats. The extra copy of the repeat is 
not likely to be an intron because it does not fit the 
highly conserved 5' and 3' intron splicing sequence, and 
27 bp is too small to be an intron. Mount et al (1992) 
showed that the smallest mRNA intron in D. 
melanogaster was 51 nucleotides by surveying 209 in- 
trons in the entire database. Our survey of all 18 
introns of three mosquito genera (Aedes, Anopheles, and 
Culex) in the database showed that intron size in 
mosquitoes ranged from 52 to 1737, with a median of 65 
nucleotides. 

The only differences between the sequence of the 
coding region of our genomic clone and the sequence of 
the cDNA of Chen et al. (1994) are the above mentioned 
substitutions and duplication. Of the four members 
of the vitellogenin gene family, only VgA2 has a 
similar restriction map to VgAl which is restricted to the 
EcoRI C-fragment (Hamblin et al., 1987). It is very 
likely that their cDNA and our genomic sequence are 
from the same vitellogenin gene, VgAl. Therefore, in 
addition to the variation at the 5' end of the cDNA 
described by Chen et al (1994), more substitutions 
and a different number of duplications are present, 
suggesting that the VgAl gene is polymorphic at many 
positions. 

Conservation at insect vitellogenins 

As described above, several genomic and cDNA se- 
quences of insect vitellogenins have been reported. With 
these sequences, and the genomic sequence of A. aegypti 
vitellogenin available, gene organization as well as the 
deduced amino acid sequences of vitellogenins of these 
insects can be compared. 

Conservation of the overall primary structure among 
insect vitellogenins. As shown in Fig. 5(A), the entire 
amino acid sequences of the three insect vitellogenins 
aligned with each other very well using dot matrix 
comparisons. The entire deduced amino add sequence of 
mosquito vitellogenin showed 49.9% similarity (5 ) to 
the boll weevil vitellogenin and 50.6% to the silkworm 
vitellogenin. The boll weevil vitellogenin showed 48.1% 
similarity to the silkworm vitellogenin. The percentage 
of overall identity (/ ) of each pair of sequences are also 
shown in the legend of Fig. 5. The significance of the 
similarities of these sequences were tested by analyzing 
the quality of each comparison and the average quality 
of 100 randomized comparisons. The adjusted quality of 
each comparison was calculated using the following 
formula of Gribskov and Burgess (1986): 



Q' (adjusted quality) » [Q (quality of each compari- 
son)— 4 (average quality of 100 randomization)]/SD 
(standard deviation of the 100 randomization). 

Generally, the Q' value is less influenced by sequence 
length. A Q' of 3.0 or higher is required for con- 
fidence that two sequences are significantly related 
(Gribskov and Burgess, 1986). Sequences that are not 
related have a low Q'. For example, A. aegypti vitel- 
logenin VgAl and Homo sapiens serum albumin have a 
Q' of -0.78. As shown in the legend of Fig. 5(A), all 
three comparisons of insect vitellogenins are highly 
significant. 

As seen in the dot matrix, some regions in the 
vitellogenins are more similar than others. The region 
with the highest sequence identity was found near 
the N-terminal end of the large subunits of all three 
vitellogenins. Yano et al. (1994b) also described a 
conserved region in a similar position. As shown 
in Fig. 6, 27.0% of the amino adds were identical 
between all three sequences, and 69.8% were identical 
between at least two of them. There are also many 
conserved substitutions. The similarities between the 
three vitellogenins in this region were 64.4% (A. aegypti 
vs A. grandis), 60.7% (A. aegypti vs B. mori), and 
57.5% (A. grandis vs B. rnori), respectively. It is 
not known if this region has a functional role, however, 
it is likely that regions with specific functions such as 
binding to the vitellogenin receptor would be more 
conserved. 

In many insects, vitellogenins are composed of 
two types of subunits, one with a molecular weight 
higher than 120kDa, and one with a molecular 
weight around 55kDa (Kunkel and Nordin, 1985). 
It has been shown that in the mosquito, sawfly, 
boll weevil and silkworm, that the small and large 
subunits of the vitellogenins are both encoded on a single 
mRNA. The four amino add residues preceding the 
cleavage sites between the small and large subunits are 
relatively conserved among all four insects as shown 
in Fig. 7. The four residues conform to the consensus 
cleavage sequence for dibasic processing endoproteases, 
R(K)XXR (Barr, 1991). In the mosquito, sawfly, and 
silkworm, the sequences flanking the cleavage site 
contain polyserine tracts which are longer in the 
mosquito and sawfly than in the silkworm. The flanking 
sequence in the boll weevil is also relatively serine rich, 
however, no polyserine tract is present. In the mosquito 
the high serine content (9.9%) is mainly due to three 
polyserine regions, two of which flank the cleavage site 
between the subunits. The third one is close to the 
C-terminal end. The silkworm and the boll weevil vitel- 
logenins contain 9.8 and 7.9% serine residues respect- 
ively. The partial sequence of the sawfly vitellogenin also 
showed a very high percentage of serines. Vertebrate 
vitellogenins are also rich in serine, however, this is due 
to a single long polyserine run in the region of phosvitin 
whose position does not match that of the insect polyser- 
ine regions. 
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FIGURE 5. (A, 1-3) Pairwise dot matrix comparisons of the complete amino acid sequences of the vitellogenins of A. aegypti, 
A. grandis, and B. mori. (B, 1-3) Pairwise dot matrix comparisons of the complete amino acid sequences of the vitellogenins 
of A. aegypti, X. laevis, and C. elegans. These were done using Compare and Dotplot of GCG. Window size was 40, and the 
stringency was 19. Percentage similarity (5 ), percentage identity (/ ), quality of the comparison (£>), and average quality of 
100 randomized comparisons (A), and the adjusted quality of the comparison (CO as defined in the text were also calculated 
using GAP of GCG, shown as the following: 
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A.aegypti WVTLRDAV AEAGTPSAFK LIFDFIKEKK LRGYEAATVI ASIAQSIRYP TEHLLHEFPL 

A. grand! S WSIFRDSV AEAGTGPALL NIKKWIETKK IQKTEAAQVI GTLAQSTRPP TEEYMRKFFE 

fl. mori WMIFRDGV TQAGTLPAFK QIQSWIENKK IQEEEAAQW VALPRTLRYP TKQIMTQPFN 

CONSENSUS (2) W-IFRD-V AEAGT-PAFK -I — WIE-KK IQ — EAAQVI — LAQS-RYP TE — M — FF- 

CONSENSUSP) W RD-V — AGT — A 1— -I— KK EAA-V L-^-- -R-P FF~ 

LVTSDWLHQ EYLNATALFA YSNFVNQAHV SNRSAYNYYP VFSFGRL . AD ADYKIIEHKI VPWFAHQLRE 
LATETQVRQQ ETLNQTCILS YTNLVHKVYI NRNESHNQFP VHAFGSFYTK KGREFVKTTV IPHLKQELEK 
FARSPAVKDQ MFLNSSALMA ATKLINLGQV NNYTAHSYYP THMYGRL.TH KHDAFVLEEI LPTLAADLKA 

LATS — V — Q E-LN-TAL-A YTNLVN V NN — AHNYYP VH-FGRL-T- K FV 1 -P-LA — L — 

V— Q — LN P G -P h — 

AVNEGDSVKI QVYIRSLGNL GHPQILSVFE PYLEGTIQIT DFQRLAIMVA LDNLVIYYPS LARSVLYRAY 
AISNADNNKI HVMIRALGNI GHKSILNVFQ PYFEGEKQVS QFQRLMMVAC MDRLADCYPH IARSVFYKIY 
TVEYKDSTKA QVYIQAIGNL GHREILKVFA PYLEGKVEIS TYLRTHIVKN LKTLAKLRDR HVRAVLFSIL 

AV DS-KI QVYIRALGNL GH — IL-VF- PYLEG — QIS -FQRL-IV — LD-LA--YP- -ARSVLY-IY 

D — K- -V-I GN- GH — IL-VF- PY-EG R L R-V 

QNTADVHEVR CAAVHLLMRT DPPADMLQRM AEFTHHDPRL YVRAAVKSAI 
QNTAELPEIR WAVHQLIRA NPPVEMLQRM AQYTNTDSQE EVNAAVKSVT 
RNTAEPYPVR VAAIQSIFIS HPTGEMMQAM AEMTHNDPSV EVRAVLKSAI 
QNTAE — EVR VAAVH-L-R- -PP-EMLQRM AE-TH-DP— EVRAAVKSAI 
-NT A R —A -P M-Q-M A- -T — D V-A-- KS-I 

FIGURE 6. Deduced amino acid sequence comparisons within the most conserved regions in the three insect vitellogenins. 
This region is located near the N-terminaJ of the large vitellogenin subunits. The first three rows are the amino acid sequences 
of vitellogenins of A, aegypti, A. grandis and B. mori, respectively. The fourth row is the sequence conserved between two insects 
(69.8% identity), and the last row is the sequence conserved in all three insects (27.0% identity). 

Runs of polyserines often contain many phosphoryl- Among these, seven are at the same positions in all three 

ation sites. Chen et ai (1994) suggested that the clusters insects and an additional seven are at the same positions 

of serines in the mosoquito vitellogenin could help in two insects. All seven cysteine positions that are 

assimilate the high level of phosphate taken up with the conserved in the three insects, and five out of the seven 

blood meal. However, as described above, polyserine that are conserved in two insects, are concentrated near 

tracts are seen in the vitellogenins of insects with differ- the C-terminus within approx. 550 amino acid residues, 
ent feeding habits. 

The locations of cysteines in vitellogenins of the Conservation of the signal peptides and positions of 

nematode and the vertebrates are relatively conserved, introns. Shown in Fig. 8 is a comparison of signal 

especially at the C-termini of the proteins (Spieth et aL, sequences of seven vitellogenins with six yolk proteins of 

1991). These authors suggested that this may be due to the Cyclorraphid Diptera. It is clear that the signal 

a requirement to form particular tertiary structures sequences are similar within the sets of vitellogenins and 

dependent on the disulfide bonds. The numbers of yolk proteins, but not between them. Some of the 

cysteines in the vitellogenins of the mosquito, boll residues are conserved, particularly the leucines at 

weevil, and the silkworm are 20, 20, and 14, respectively, positions 6, 7, 9, 10 and 13 (Fig. 8). Within the seven 

a. aegypti yvrpgnetss. ssssssssss sssessssss esvenpki&p veqykplldk vekrgnjotir 

A, grandis NNQQQQPEEL fiNPLDIGNLV YTYGQPKNNQ VHSKLNENLM EDS£SEEJS£E QEMTHRRJRR 

A. rosae No sequence available<— £ sssssssbed QQRGN£ENKN NNQHGHRQJUI 

B. mori AEWPRAGAMR PAQSILY£L£ TKQMTKHYEfi SgSSSSSBSH EFNFPEQHEH PHQSNQRfiJUt 

DLNAIKEKKY YEAYKMDQYR LHRLNDTSfiD SSSSDSSSSS ££E£|CEHRNG TSS YSSSSSS SfiSKSSSESS 
SANfiLTKQWR E££BEWNQQQ QQPRPQLTRA PH&PLLPSMV GYHGK&IKEN KDPDIRQNVE NLVTEI£DEI 
AANQRGQGNQ nS DfififififiSS SSSSSSSSD S SSSSSS ESDS L&££EEYWQ£ RPTLTDAPEA PMLPLFIGYK 
£YMR£JO.VTV HKVLKKRNSE SSSGSSSSSA DSSS TYINDD IPDIDEPAYA ALYKSPQPHA DKKQNAMNAQ 

S YSS5SSSSS 5SYSISSEEY 
KQSJJKTISKH TLDKYTILNT 
GSAAQQSSQV NPVSVAKKLA 
KILQDIAQQL QNPNNMPKfiD 



FIGURE 7. Deduced amino acid sequence comparison of the regions near the cleavage site between the small and large 
vitellogenin subunits of four species of insects showing serine rich regions flanking the cleavage site. The four amino acid 
residues preceding the cleavage site (in bold) are aligned. Serines are underlined. 
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MNP LRTLCVMAC LL AVAMG 
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MNPLKIFCFLALVI AVASA 

MNPLTIFCLVAVLL SAATA 
MNPLRIVCVALLLAAAGSA 

NPLRI C A h AAA 
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FIGURE 8. The signal sequence of the A. aegypti VgAl precursor protein compared with signal sequences from other insects. 
Conserved residues among either the seven insect vitellogenins or the six Cyclorraphid Diptera yolk proteins, defined as being 
present in more than 50% of the sequences, are shown below. The conserved residues in vertebrate vitellogenin are also shown 
(Blumenthal and Zuker-Aprison, 1987). An underlined amino acid indicates the position of an intron within that codon. The 
asterisk indicates that the exact end of the signal sequence has not been determined. The double asterisk indicates sequences 
derived from a cDNA, thus information about intron position is not available. The Anopheles gambiae sequence is from P. 
Romans (unpublished observations). Other signal sequences are from: boll weevil, A. grandis (Trewitt et aL, 1992); silkworm, 
B. mori (Y ano et o/., 1994a); locust, L. migratoria (Locke et a/., 1987); saw fly, A. rosae (Kageyama et a/., 1994); the fruit fly 
D. melanogaster (Hung et a!., 1983; Garabedian et al., 1987), the Mediterranean fruit fly, C. capitata (Rina and Savalris, 1991) 
and the blowfly, C. erythrocephala (Martinez and Bownes, 1994). The length of the intron in the locust genes has not been 

determined. 



vitellogenin signal sequences, six contain an intron in the 
same location, 10 residues from the initial methionine. 
The size of this intron varies considerably from 70 in A. 
aegypti to 2057 in A. grandis. The residue at position 1 1 
(at the intron insertion) is either an alanine or a valine. 
The lengths of the signal sequences of these vitellogenins 
are between 15 and 18 amino acids. 

The signal sequence of yolk protein b (YPb) of the C. 
erythrocephala is 19 amino acids long (Martinez and 
Bownes, 1994). The lengths of the signal sequences of the 
other five yolk proteins are unknown because the N-ter- 
minal sequences of the mature proteins have not been 
reported. As shown in Fig. 8, these six signal sequences 
are very similar, however, the conserved residues are not 
similar to those in the vitellogenins of other insects, 
except for the leucine at position 13 and alanine at 
position 11. 

Signal sequences of vertebrate vitellogenin genes are 
highly conserved (Spieth et al., 1985; Blumenthal and 
Zuker-Aprison, 1987). Most of the conserved residues 
are different from those conserved in the insect vitel- 
logenins and those in yolk proteins. However, some of 
the residues conserved in vertebrates are also conserved 
in the seven insect vitellogenins, particularly the leucines 
at positions 6, 9 and 10 (Fig. 8). 

The mosquito VgAl gene has two introns. There are 
six introns in both boll weevil and silkworm vitellogenin 



genes. As described above, the position of intron 1 in the 
signal sequence is conserved. The position of the second 
intron in the mosquito is conserved with intron 6 of the 
boll weevil, and intron 5 of the silkworm, splitting the 
codon of a proline. The position of intron 3 of the boll 
weevil is also conserved with intron 2 of silkworm, 
splitting the codon of a leucine. There are, therefore, 
three conserved positions of introns in these insects. 
Trewitt et al. (1992) found that the second and third 
conserved positions in the boll weevil are also conserved 
in intron 12 and 29 of the chicken and frog vitellogenin 
genes. However, because the similarities between insect 
and vertebrate vitellogenins are limited, the comparison 
of intron position between insects and vertebrates may 
be imprecise. 

Relationship between insect vitellogenins and vitellogenins 
of vertebrates and other invertebrates, and the yolk pro- 
teins of Cyclorraphid Diptera 

As described above, insect vitellogenins are very simi- 
lar in their primary structures, suggesting that they came 
from a common ancestor. There is also limited similarity 
between insect, nematode and vertebrate vitellogenins. 
Figure 5(B) shows the dot matrix comparisons of vitel- 
logenins between A. aegypti and Xenopus laevis, between 
A. aegypti and Caenorhabditis elegans, and between X. 
laevis and C. elegans. The percentage similarity and 
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identity of each comparison, and the quality and the 
adjusted quality of each comparison are shown in the 
figure legend. Clearly, the nematode vitellogenins are 
more similar to those of vertebrates than insects. How- 
ever, Q' values for the comparisons of A, aegypti vs X. 
laevis 9 and A, aegypti vs C. elegans are both higher than 
3.0, suggesting that they are also related. 

It has been suggested that the yolk proteins in Cyclor- 
raphid Dipteran insects such as D. melanogaster (Hung 
et al 9 1983; Garabedian et al 9 1987), C. capitata (Rina 
and Savakis, 1991) and C. erythrocephala (Martinez and 
Bownes, 1994) are related to mammalian triacylglycerol 
lipase (Baker, 1988; Bownes et al 9 1988; Terpstra and 
AB, 1988). It has been shown that yolk proteins in these 
insects are very closely related to each other in terms of 
amino acid sequence (Rina and Savakis, 1991; Martinez 
and Bownes, 1994). Our analysis indicates that simi- 
larities between these yolk proteins and vitellogenins of 
other insects are limited. For example, all three yolk 
proteins of D. melanogaster matched to a region around 
the cleavage site of A. aegypti vitellogenin, with 38-40% 
similarity, with 12-19 gaps inserted. The similarities are 
relatively low considering the length of sequences com- 
pared. The Q' values of these comparisons are 0.47 (for 
YP 1), 0.57 (YP 2), 1.73 (YP 3). All are less than 3.0, 
suggesting that the similarities between these pairs are 
not significant. Therefore, the relationship of the yolk 
proteins of Cyclorraphid Diptera and the vitellogenins 
of other insects is very distant. 

The phylogenetic relationship between vitellogenins of 
insects, other invertebrates, vertebrates, and the yolk 
proteins of the Cyclorraphid Diptera is a rather complex 
question. Simple sequence comparisons are not sufficient 
to address these problems: detailed phylogenetic ap- 
proaches will need to be employed. Meanwhile, the 
accumulation of molecular data in this field will be very 
useful for this analysis. 

Analysis for potential regulatory regions in the VgA I gene 
Several lines of evidence suggest that the expression of 
vitellogenin genes of A. aegypti is under the control of 
20-hydroxyecdysone (ecdysone) (Spielman et al 9 1971; 
Hagedorn et a/., 1973, 1975; Bohm et al 9 1978; Borovsky 
et al. 9 1985; Gemmill et al 9 1986; Racioppi et al.y 1986; 
Ma et al. t 1987). Several regulatory elements have been 
identified in steroid hormone controlled genes. Ecdysone 
receptors of D. melanogaster and Chironomus tentans 
have been cloned and sequenced (Koelle, 1991; Imhof 
et al. y 1993). The ecdysone receptor in D. melanogaster 
functions as a heterodimer with ultraspiracle (USP), a 
member of the retinoid X receptor family of transcrip- 
tion factors (Yao et aL, 1992; Thomas et at., 1993; 
Antoniewski et al.y 1994). Antoniewski et aL (1993) 
proposed a revised consensus ecdysone response element 
(EcRE) [PuG(G/T)T(C/G)A(N)T G(C/AXC/A)(C/t)Py] 
for D. melanogaster based on mutational experiments. 
EcR and ultraspiracle (USP) heterodimer binding sites 
(EcR/USP) have also been identified (Antoniewski et al. 9 
1994). Moreover, three relatively long regions that con- 



trol tissue specific expression (fat body enhancers) have 
been identified in the YP1, YP2, and Fbp 1 genes of D. 
melanogaster (Garabedian et al. 9 1986; Abrahamsen 
et al.y 1993; Antoniewski, 1994). A 9 bp consensus 
sequence was found in the regulatory regions of YP1-3 
genes of D. melanogaster (Logan and Wensink, 

1990) . Other transcription factors such as the 
CCAAT/enhancer binding protein (C/EBP), and dou- 
blesex proteins (DSX) have also been shown to bind to 
the regulatory regions of these yolk protein genes (Burtis 
et al.y 1991; Falb and Maniatis, 1992). C/EBP is involved 
in the acquisition of responsiveness to steroid hormones 
in some genes in mammals (Ben-Or and Okret, 1993). 
DSX proteins have been shown to regulate female 
specificity of the expression of the yolk proteins in 
D. melanogaster (Burtis et a/., 1991). Furthermore, 
Fos and Jun binding proteins (AP-1 family of tran- 
scription activators) that are known to interact with 
glucocorticoid receptors (Miner and Yamamoto, 1991) 
in yeast and mammals, have also been found in 
D. melanogaster (Perkins et aL, 1988). Finally, hormone 
response elements (HREs) for many other members of 
the steroid hormone receptor superfamily, such as gluco- 
corticoid receptor (GR), estrogen receptor (ER), thyroid 
hormone receptor (TR), vitamin D receptor (VDR), 
retinoid X receptor (RXR), retinoic acid receptor 
(RAR), have also been identified (Martinez and Wahli, 

1991) . 

Using computer programs including Findpattern, 
Bestfit, and Gap of the GCG software package, we 
searched the vitellogenin gene for evidence of potential 
regulatory elements, focusing on the 2015 bp of 5' 
upstream region, the introns, and the 3' untranslated 
region. Relatively stringent criteria were used and sites 
that did not show the correct spacing were eliminated. 
Two regions in the EcoR I B-fragment were found with 
multiple copies of sequences with high degrees of identity 
to the consensus sequences of different hormone re- 
sponse elements. These elements are more numerous in 
region 1 (-76 to -457 bp) than in region 2 (-708 to 
— 1106). Most of the elements in region 1 also have 
higher degree of identity to the consensus sequences than 
those in region 2. Moreover, the fat body enhancers of 
D. melanogaster YP1 (127 bp, Garabedian et al. 9 1986) 
and YP2 (343 bp, Abrahamsen et al.y 1993) are similar 
to the sequence around region 1, although the similarity 
is not statistically significant. 

Shown in Fig. 9 is the overall structure of the EcoR 
I B-fragment and a detailed illustration of the putative 
regulatory regions 1 and 2. Within region 1, three short 
stretches of sequences were found to contain several 
HREs, shown as A, B, C. In addition to the HREs 
shown in A, B, C, three other HREs including a copy of 
C/EBP binding site (-457 to —449) were also found 
within region 1. Within region 2, two short stretches of 
sequences were found to contain multiple HREs, shown 
as D, E. In addition to the HREs shown in D and E, 
three copies of EcR/USP binding sites, one DSX binding 
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site, and one ERE were also found in other areas of 
region 2. 

An EcRE (+ 105 to + 1 17, 1 1 of 13 bp are identical to 
the consensus) was found near the 5' end of the first 
introru Moreover, a C/EBP site (-1- 103 to + 1 1 1, 9 of 9 
identical) overlaps this EcRE. An EcRE ( + 5394 to 
+ 5406, II of 13 identical), and a C/EBP site ( + 5405 
to +5413, 9 of 9 identical) were also present in the 
second intron. It is also noteworthy that two DSX sites 
( + 253 to 261, 7 bp of 9 identical; and +280 to +288, 
8 bp of 9 identical) were found in the coding region 
where the three imperfect repeats are located. Several 
HREs were also found in the 116 bp 3' untranslated 
region including two EcREs ( + 6684 to +6696 and 
+6731 to +6743, both 11/13), one EcR/USP site 



(+6730 to +6744, 1 1 of 1 5 identical), two Fos binding 
sites ( + 6680 to +6686, and +6757 to 6763, both 6 
of 7 identical) and two CSDs ( + 6684 to +6692, 8 of 
9 identical, and +6754 to +6762, 7 of 9 identical). 
Deitsch and Raikhel (1993) found four repeated putative 
regulatory elements with variable levels of similarity 
to retinoic acid response elements (RAREs) in the 
vitellogenic carboxy peptidase gene, a fat body specific, 
female specific, ecdysone controlled gene in A. aegyptL 
A single RARE (10/12) was found in the VgAI gene 
(see Fig. 9) with the correct spacing between half 
sites. 

The presence of these conserved sites may be fortu- 
itous and must be confirmed with functional evidence, 
which is currently being sought. 
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FIGURE 9. Potential regulatory sequences in the EcoR I B-fragment of the VgAI gene of A. aegypti. Putative regulatory region 

1 is shown with three short stretches of sequences (A, B, Q where regulatory elements concentrate. Putative regulatory region 

2 is shown with two short stretches of sequences (D, E) where regulatory elements concentrate. Arrows under the sequence 
indicates the locations where specific hormone response elements match. The direction of the arrows represents the direction 
of the binding site. CSD: consensus sequence of the enhancer regions of D. meUmogaster yolk proteins 1-3, GAATCAATG 
(Logan and Wensink, 1990); DSX: consensus binding site for double sex binding protein, CTACAAAGT (Burtis et al. t 1991); 
EcRE: Ecdysone response elements, RG[G/T]T[C/GJANTGIC/A][C/A][C/t]Y (Antoniewski et a/., 1993); EcR/USP (FBpl): 
EcR/USP heterodimer binding site identified in the Z>. melanogaster Fbpl gene, GGGTTGAATGAATTT (Antoniewski et a/., 
1994); EcR/USP (Hsp 27): EcR/USP heterodimer binding site identified in the D. melanogaster Hsp 27 gene, GGGTTCAAT- 
GCACTT (Antoniewski et al„ 1994); ERE: Estrogen response clement, N A GGTCAN C NN G TGACCN T (Martinez and Wahli, 
1991); Fos: binding site for Fos protein, TGACTCA; GRE: glucocorticoid response element, AG[A/G)ACANNNTGTACC 
(Martinez and Wahli. 1991); RRE [RARE]: retinoic acid response element, [A/GJGGfT/AJCA in direct repeats or in 
palindromes (Martinez and Wahli, 1991); VDRE: vitamin D response element, GGTGANTCACC, or TCACCNGGTGA 
(Martinez and Wahli, 1991). Other consensus sequences mentioned in the text, which are not shown in Fig. 9 include, C/EBP 
binding site of D. melanogaster YP1 gene, TGTTGCAAT (Falb and Maniatis. 1992); and C/EBP binding site consensus in 

mammals, T(T/G]NNG(C/T)AA(T/GJ (Falb and Maniatis, 1992). 
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The sex-determining gene doublesex in the fly 
Megaselia scalaris: Conserved structure and sex- 
specific splicing 
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Abstract: The well-known sex-determining cascade of Drosophila melanogaster serves as a paradigm for the pathway 
to sexual development in insects. But the primary sex-determining signal and the subsequent step, Sex-lethal (Sx!), have 
been shown not to be functionally conserved in non-Drosophila flies. We isolated doublesex (dsx), which is a down- 
stream step in the cascade, from the phorid fly Megaselia scalaris, which is a distant relative of D. melanogaster. Con- 
served properties, e.g., sex-specific splicing, structure of the female-specific 3' splice site, a splicing enhancer region 
with binding motifs for the TRA2/RBP1/TRA complex that activates female-specific splicing in Drosophila, and con- 
served domains for DNA-binding and oligomerization in the putative DSX protein, indicate functional conservation of 
dsx in M. scalaris. Hence, the dsx step of the sex-determining pathway appears to be conserved among flies and proba- 
bly in an even wider group of insects, as the analysis of a published cDNA from the silkmoth indicates. 

Key words: sex-determining cascade, splice regulation, DNA-binding domain, oligomerization. 

RisumG : La cascade g6n6tique qui determine le sexe chez le Drosophila melanogaster est bien connue et elle sert de 
paradigme pour P6tude du ddterminisme sexuel chez les insectes. Cependant, il a 6t6 montre* que le signal primaire du 
d&erminisme sexuel et P6tape subsequente, Sex-lethal (Sxl), ne sont pas conserves chez les mouches n'appartenant pas 
au genre Drosophila. Les auteurs ont isole* le gene doublesex (dsx), lequel intervient en aval dans la cascade, chez la 
mouche phoride Megaselia scalaris, laquelle est un lointain cousin du D. melanogaster. L'homologue putatif dsx 
montre des propri^s conserves (1'epissage sp6cifique en fonction du sexe, la structure du site d'^pissage en 3' speci- 
fique aux femelles, une region d'accroissement de I"£pissage comprenant des motifs de liaison au complexe 
TRA2/RBP1/TRA qui active Petrissage specifique aux femelles chez la drosophile) et la proline putative comprend des 
domaines conserves pour la liaison a l'ADN et pour Poligomensation. Tout cela suggere une conservation de la fonc- 
tion de dsx chez le M. scalaris. Ainsi, Petape dsx de la cascade semble etre conserved chez les mouches et vraisembla- 
blement chez un nombre encore plus grand d'insectes comme le suggere Panalyse d'une sequence d'ADNc du vers a 
soie. 

Mots cles : cascade du determinisme sexuel, regulation de Pepissage, domaine de liaison a PADN, oligom£risation. 



[Traduit par la Redaction] 

Introduction 

Sex-determining mechanisms defy the expectation that 
common basic biological functions use common genetic 
pathways. Primary sex-determining signals, as well as subse- 
quent signal processing steps can be different even in related 
species. Flies are a group in which various primary sex- 
detenruning mechanisms have been discovered. In Drosophila 
melanogaster, the primary sex-determining signal is the ratio 
of X chromosomes to autosome sets, the X/A ratio (Bridges 
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1925). A number of X-chromosomal so-called numerator 
and one or more autosomal denominator genes are interact- 
ing to decide upon the female or male developmental path- 
way of the embryo (review: Cline and Meyer 1996). In 
Chrysomya rufifacies, sex is determined by a maternal fac- 
tor; unisexual progenies, all-female or all-male, are pro- 
duced depending solely on the genotype of the mother 
(Ullerich 1984). Presence or absence of an epistatic Male- 
ness factor is the primary sex-determining signal in several 
other fly species, e.g. Ceratitis capitata (Willhoeft and 
Franz 1996), Lucilia cuprina (Bedo and Foster 1985), or 
Megaselia scalaris (Mainx 1 966). In M. scalaris, the Male- 
ness factor can change its location in a transposon-like fash- 
ion within the genome (Traut and Willhoeft 1990; Traut 
1994). In the house fly, Musca domestica, several primary 
sex-determining mechanisms coexist (Diibendorfer et al. 
1992). 

Another known sex-determining cascade is that of the 
nematode Caenorhabditis elegans. Though resembling that 
of D. melanogaster superficially, transmission of the primary 
signal is mediated via steps of transcriptional regulation in- 
stead of splice regulation, and by a non-homologous set of 
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genes (reviewed in Kuwabara 1999). Comparative analyses 
in other fly species indicate that the sex-determining path- 
way of Drosophila is not even conserved among flies. Apart 
from the primary signal, the subsequent step in the cascade 
is also different. Sex-lethal (Sxl) serves this function and is 
sex-specifically spliced in Drosophila but not in non- 
Drosophila species like Chrysomya rufifacies (Muller- 
Holtkamp 1995), Ceratitis capitata (Saccone et al. 1998), 
Musca domestica (Meise et al. 1998), and Megaselia 
scalaris (Sievert et al. 1997; Sievert et al. 2000). Hence, it 
does not transmit the sex-determining signal. In contrast, 
doublesex (dsx), which functions as a double switch at the 
bottom of the cascade, initiating either female or male so- 
matic development, appears to be structurally and function- 
ally conserved in the Queensland fruit fly Bactrocera tryoni 
(Shearman and Frommer 1998) and M. scalaris (Sievert et 
al. 1997, this paper^. 

Here, we report on the functional organization of dsx in 
M scalaris and discuss the role of dsx in sex determination 
of insects. 

Material and methods 

General methods 

Genomic DNA and RNA were isolated from adult flies of the 
M. scalaris wild-type strain Wien, which has been kept in the labo- 
ratory for more than 30 years (Mainx 1964). Culture conditions 
were as described by Willhoeft and Traut (1990). 

For molecular methods, we used standard procedures 
(Sambrook et al. 1989) unless otherwise noted. Genomic DNA 
from female and male flies was isolated as described by Blin and 
Stafford (1976). Total RNA was isolated with the Trizol reagent 
(Life Technologies, Eggenstein, Germany). Poly(A) + RNA was 
prepared from total RNA using the PolyATtract mRNA Isolation 
System IV (Promega Biotech, Madison, Wis.). The following 
oligonucleotides were custom-synthesized by MWG Biotech 
(Ebersberg, Germany): DSXf562R (TTCCTGGGAGGATCG- 
TAGGG), DSXFbr (CTATTCCATTCCGGTTTCCG), DSXgb996R 
(GTTGCAAAACATAGAGTGGC), DSXK362R (ATCAAATTT- 
CATATGGCAAG), and DSXK788F (TTGAGCAATTTCGATTTC- 
CC). 

For automated sequencing, we used the ABI PRISM™ 
Sequenase Terminator Double-Stranded DNA Sequencing Kit and 
the ABI PRISM™ dRhodamine Terminator Cycle Sequencing 
Ready Reaction Kit (PE Applied Biosystems, Weiterstadt, Ger- 
many). 

Northern and Southern hybridization 

Poly(A) + RNA from adult female and male M scalaris was sep- 
arated in denaturing agarose gels and blotted to a non-charged ny- 
lon membrane (Schleicher and Schull, Dassel, Germany) using a 
vacuum blotting device and 20x SSPE (3.6 M NaCl, 200 mM 
NaH 2 P0 4 , 20 mM EDTA, pH 7.4) as transfer buffer. Hybridization 
was performed at 42°C for 24 h in a solution containing 45% (v/v) 
deionized formamide, 5* SSC [1* SSC is 150 mM NaCl, 15 mM 
sodium citrate, pH 7], 5* Denhardt's solution [1* Denhardt's solu- 
tion is 0.1% polyvinylpyrrolidone (w/v), 0.1% (w/v) BSA, 0.1% 
(w/v) Ficoll 400], 0.1% (w/v) SDS, and 100 ug/mL salmon testis 
DNA (sonicated and denatured; Sigma-Aldrich Chemie GmbH, 
Deisenhofen, Germany). Washes were performed at 42°C in 0.1 * 
SSC, 0.1% (w/v) SDS. 

For Southern blots, 5 ug genomic DNA from female or male 
flies, was digested with BamMl, BgUl, £coRI, Hindlll, or Xbal, 
and separated in a 1% agarose gel. After 40 min incubation in a 
denaturing buffer containing 0.5 M NaOH and 1.5 M NaCl, the 



DNA was blotted to a non-charged nylon membrane using a 
vacuum device and the denaturing buffer as the transfer solution. 
Hybridization was performed at 68°C for 24 h in a solution con- 
taining 0.5 M Na2HP0 4 -NaH 2 P0 4 (pH 7.2) and 7% (w/v) SDS. 
Washes were performed at 68°C in 0.1 M Na 2 HP0 4 -NaH 2 P0 4 
(pH 7.2), 0.1% (w/v) SDS. 

Probe DSX5CS is a Hindlll fragment of pMSWcl78 (position 
1-773) labeled by random priming with [(* 32 P]dCTP. Southern hy- 
bridization was visualized using a Bio-Imaging Analyzer BAS- 
1000 (Fujifilm) and the accompanying analytical software pcbas 
(Raytest, Straubenhardt, Germany). 



Isolation of a genomic sequence 

A genomic fragment of M scalaris dsx was obtained by nested 
PCR on 300 ng genomic DNA from female and male flies with 
primers DSXK788F/DSXf562R in the first and DSXK788F/DSXFbr 
in the second round of amplification. We used 5 U Taq DNA poly- 
merase in the buffer provided by the supplier (Life Technologies, 
Eggenstein, Germany), and 200 uM of each dNTP, 1.5 mM MgCl 2 , 
and 0.2 pM of each primer. Cycling parameters in the first round 
were: 4 min initial denaturation at 94°C, hot start, 30 cycles of 30 s 
at 94°C, 30 s at 58°C, and 3 min at 72°C. In the second round: 
4 min initial denaturation at 94°C, hot start, 30 cycles of 30 s at 
94°C, 30 s at 55°C, and 1.5 min at 72°C, followed by 10 min final 
extension at 72°C. Four independent clones with identical intron 
sequence were isolated: pMSW2471 and pMSW2472 from DNA 
of female flies, pMSW2473 (Acc. No. AF283697) and pMSW2474 
from DNA of male flies. 



Reverse transcription PCR (RT-PCR) 

Poly(A) + RNA was reverse transcribed using a AfofI(dT) 18 
primer and the First Strand cDNA Synthesis Kit (Pharmacia AB, 
Uppsala, Sweden). PCR was performed with 5 U Taq DNA poly- 
merase in the buffer provided by the supplier, 200 uM of each 
dNTP, 1.5 mM MgCl 2 , and 0.2 uM of each primer. The cycling 
parameters for PCR were: 4 min initial denaturation at 94°C fol- 
lowed by 38 cycles with 45 s at 94°C, 45 s at 53°C (primers 
DSXgb996/DSXK362R) or 60°C (DSXK788F/DSXf562R), 1.5 min 
at 72°C, and a final extension of 10 min at 72°C. 



Construction of a cDNA library and screening for 
M. scalaris dsx 

For construction of cDNA libraries from female and male 
M. scalaris, we used the Gigapack III Gold Cloning Kit 
(Stratagene, La Joila, Calif.) according to the manufacturer's in- 
structions. Roughly l x x 10 6 plaques were screened with a RACE 
(rapid amplification of cDNA ends) generated cDNA probe from 
A£ scalaris dsx (Sievert et al. 1997). 

Sequence tools 

blastx 2.0.11 (Altschul et al. 1997) was used for searches 
against the nr database (containing all nonredundant GenBank 
CDS translations, PDB, SwissProt, PIR and PRF entries), tblastn 
2.0.14 (Altschul et al. 1997) for searches against the nr database 
and the dbEST database at NCBI (Bethesda, Md., accessed July 
2000, http://www.ncbi.nlm.nih.gov/BLAST/ Altschul et al. 1997). 
Multiple sequence alignment was carried out using clustalw 
(http://www2.ebi.ac.uk/clustalw, Thompson et al. 1994) using de- 
fault parameters, the BLOSUM62 matrix (Henikoff and Henikoff 
1992), and subsequent manual improvement. Phylogenetic analysis 
of protein sequences was performed with the neighbor-joining 
method of Saitou and Nei (1987), provided by the clustalw tool. 
The resulting tree was visualized with treeview (Page 1996, 
http://taxonomy.zoology.gla.ac.uk/rod/treeview.html). 
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Fig. 1. (A) Structure of Megaselia scalaris dsx cDNA. pMSWcI78 was isolated from the cDNA library of female flies, pMSWc233 
from the library of male flies. Triangles, start and stop codons of the ORFs; horizontal arrows, primer positions; vertical arrows, 
polyadenylation signals. (B) Composition of the putative female-specific protein DSX f and the male-specific protein DSX ra . 
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Fig. 2. RT-PCR on poly(A) + RNA from female and male flies 

(A) with primers DSXgb996 defined from the 3' common seg- 
ment and DSXK362R from the 3' extension of pMSWc233 

(B) with primers DSXK788F from the 5' common segment and 
DSXf562R from the 3' common segment of Megaselia scalaris 
dsx. + and - indicate presence or absence of reverse transcriptase 
in the reaction mix. 
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Fig. 3. (A) Northern blot of poly(A) + RNA from female and 
male flies, hybridized with probe DSX5CS from the 5' common 
segment of Megaselia scalaris dsx; autoradiography. (B) South- 
ern hybridization of DSX5CS to genomic DNA from female and 
male flies digested with restriction enzymes as indicated; 
phospho-imaging. 
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Results 

RNA and cDNA of Af. scalaris dsx 

A 189-bp cDNA fragment of M. scalaris dsx had been 
isolated previously (Sievert et al. 1997). Using 3' RACE on 
RNA of female flies with primers defined from this fragment 
yielded a cDNA of approximately 780 bp (Sievert et al. 
1997). This was used as a probe to screen cDNA libraries 
from poly(A) + RNA of female and male M scalaris. One 
clone of 2.7 kb (pMSWc!78, Acc. No. AF283695) was iso- 
lated from a cDNA library of female flies and three clones 



of 2.4 kb (pMSWcl84), 2.5 kb (pMSWc234), and 3.0 kb 
(pMSWc233, Acc. No. AF283696) from a cDNA library of 
male flies. The three cDNA clones from males were identi- 
cal, except for the different extensions towards the 5' end. 
All four clones contained the previously isolated 189-bp 
fragment of M. scalaris dsx. 

Similarly to dsx in D. melanogaster and B. tryoni (Burtis 
and Baker 1989; Shearman and Frommer 1998), M. scalaris 
dsx transcripts are different in females and males. Three seg- 
ments can be distinguished in dsx cDNAs of M. scalaris 
(Fig. 1A). The dsx cDNAs of female and male flies share a 5' 
common segment, which in the longest cDNA has a length 
of 949 bp. They differ in a female-specific segment of 635 
bp (position 899-1533 in pMSWc!78) which follows the 5' 
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Fig. 4. Alignment of the putative DSX protein sequences from Megaselia scalaris (Acc. No. AF2 83695), Drosophila melanogaster 
(Acc. No. P23022), Bactrocera tryoni (Acc. No. AF029675), and Bombyx mori (Acc. No. AV398350). (A) The part of the proteins that 
is common to DSX f and DSX m ; (B) female-specific part of DSX f , the B. mori sequence is truncated; (C) male-specific part of DSX m 
from M. scalaris (Acc. No. AF283696), D. melanogaster (Acc. No. P23023), and B. tryoni (Acc. No. AF029676). Black boxes, amino 
acids identical to the M. scalaris sequence; shaded boxes, amino acids similar to the M. scalaris sequence; DBD/OD1, DNA-binding 
and oligomerization domain 1; asterisks, residues whose replacement abolishes DNA binding activity; OD2, oligomerization domain 2 
(the extension of the male-specific part of OD2 in M scalaris is not clear, the label refers to D. melanogaster and B. tryoni); arrow- 
heads, additional conserved region. 



common segment in the female-derived cDNA, but is absent 
in the three male-derived clones. The remaining 3' common 
segment up to the polyadenylation site in the female-derived 
clone is present in all cDNAs. The three male-derived 
cDNAs are extended 904 bp downstream of that point, but 
this 3' extension is neither part of the open reading frame 
(ORF) nor male specific. 

RT-PCR with primers DSXgb996/DSXK362R (for primer 
positions, see Fig. 1A) amplified a fragment of the 3' exten- 
sion in poly(A) + RNA from both female and male flies 
(Fig. 2 A; Fig. 1 A, RT-PCR product). Presence or absence of 
the 3' extension in the 3' common segment appears to depend 
on the use of two alternative (but not sex-specific) poly- 
adenylation sites: the first one with a polyadenylation signal 
20 bp before the poly(A) stretch in pMSWcl78, the second 
with a series of three polyadenylation signals at 25 bp, 29 
bp, and 39 bp upstream of the poly(A) tail in pMSWc233. In 
spite of the multiple polyadenylation signal, poly(A) tails 
start at identical positions in the three male-derived cDNA 
clones. 

We performed RT-PCR on poly(A)* RNA from females 
and males with primers D v SXK788F/DSXf562R, which 
spanned the female-specific segment. It amplified fragments 
compatible with the expected sizes of 1363 bp from females 
and 728 bp from males (Fig. 2B). This is evidence for the 
presence of the female-specific segment in female and its ab- 
sence in male poly(A) + RNA. 

Northern hybridization with probe DSX5CS from the 5' 
common segment detected one prominent transcript in 
poly(A) + RNA from each of the female and male flies 
(Fig. 3A). The female-specific transcript, dsxf, was roughly 
0.7 kb larger than the male-specific transcript, dsxf". The size 
difference corresponds to the size of the female-specific seg- 
ment of 635 bp and confirms the presence of sex-specific dsx 
transcripts in females and males. 

The dsx poly(A) + RNAs are approximately 1 kb larger 
than the cDNAs plus a putative poly(A) tail. Hence, al- 
though the cDNA clones contain the complete ORF (see be- 
low), they are incomplete at the 5' untranslated region 
(UTR). Probably, the difference is accounted for by a long 5' 
UTR, similar to that in D. melanogaster (Burtis and Baker 
1989). 

Megaselia scalaris dsx is a single-copy gene 

Probe DSX5CS from the 5' common segment was hybrid- 
ized to genomic DNA of female and male flies, digested 
with one of five different restriction enzymes that do not cut 
within the probe sequence (Fig. 3B). The probe detected one 
fragment in each DNA sample. Therefore, M. scalaris dsx is 
most likely a single-copy gene, and the sex-specific tran- 
scripts dsx and dsx" 1 are alternative splice products. 



Table 1. Six sequences from the female-specific exon of Mega- 
selia scalaris dsx with similarity to the 13-nt splicing enhancer 
repeat elements of Drosophila melanogaster dsx (Burtis and 
Baker 1989; Inoue et al. 1992). 



Position Sequence Identity 



247 


TCTTCAATCAACA 


13/13 


291 


cCATCAATCAACA 


12/13 


320 


TCATaAgTCAACA 


11/13 


275 


TCAACAtTCAAtc 


10/13 


602 


atATCAATCAAtA 


10/13 


269 


caTTCA-TCAACA 


10/13 



Note: Position, distance downstream of the alternative splice site of the 
female-specific exon; identity, number of nucleotides (upper case letters) 
matching the consensus TCWWCRATCAACA (where W is A or T, and 
R is A or G) from D. melanogaster. 



The putative DSX proteins 

The female-specific transcript contains an ORF that codes 
for a putative female-specific protein, DSX f , of 310 amino 
acids. It starts in the 5' common segment (ATG at position 
58 in pMSWcl78) and ends in the female-specific segment 
(TGA at position 988 in pMSWcl78). The male-specific 
transcript includes a longer ORF that codes for a putative 
male-specific protein, DSX m , of 573 amino acids. The ORF 
starts at the same site in the 5' common segment (ATG at po- 
sition 109 in pMSWc233) as in the female-specific transcript 
but ends in the 3' common segment (TAG at position 1828 of 
pMSWc233). 

Megaselia scalaris DSX f and DSX m proteins share an N- 
terminal part of 280 amino acids (common part, Fig. IB) but 
differ in a sex-specific C-terminal part, which is short in 
DSX f (30 amino acids) and long in DSX m (293 amino ac- 
ids). The same pattern of sex-specific protein composition is 
found in DSX f and DSX m from D. melanogaster and 
B. tryoni (Burtis and Baker 1989; Shearman and Frommer 
1998). 

A blastx search with pMSWcl78 in the nr sequence data- 
base of NCBI returned the DSX entries from B. tryoni and 
D. melanogaster with highest scores. In a tblastn search in 
the dbEST database, two entries from cDNA libraries of 
Bombyx mori, submitted by K. Mita, M Morimyo, T. 
Shimada, K. Okano, and S. Maeda (Chiba, Japan), showed 
highest similarity. Both were derived from female tissues. 
blast searches with the male-derived pMSWc233 sequence 
did not return further significant entries. 

We selected the DSX f proteins of the insects 
D. melanogaster, B. tryoni, and the translated sequence of 
one of the B. mori cDNAs (Acc. No. AV398350) for a com- 
parison with DSX f of M. scalaris. M. scalaris DSX f has 
highest similarity with D. melanogaster DSX f (58% identity, 
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79% similarity; calculations are based on the shorter 
M. scalar is sequence). The protein alignment in Fig. 4A and 
4B reveals high similarity among all four proteins at both 
ends, whereas the central region is barely conserved. 



The N-terminal conserved region ranges from the N- 
terminus up to and including the zinc-finger-like DNA- 
binding and oligomerization domain (DBD/OD1) of 
D. melanogaster DSX (Erdman and Burtis 1993; An et al. 
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Fig. 5. Alignment of repeats #1 and #2 from the 3' common seg- 
ment of Megaselia sections dsx. Bold letters represent corre- 
sponding amino acid sequence from DSX m . 

4f~\ PIFDLSAHRQSLQLSQBDSR 402 

1 CCTATTTTTGATTTAAGTGCTCATCGTCAGTCTTTGCAACTATCCCAGGAAGACAGCAGA 1314 

n iii i mi urn linn ill i i mi mhi 1 1 ii 1 1 tin 

CCAGTTTATGATTTAAGCACTCATCGTCCGCCCCTACGATCCTCCCAGGAAGAGTGTAGA 1587 
#/P VYDLSTHRPFLRSSQERCR 493 



4h"\ KBVE-VNVHRFBRNDQBKLA 421 

1 AAGGAGGTGGAA GTAAATGTTCACAGATTTCACAGAAATGACCAGGAAAAGCTAGCT 1371 

1 1 1 1 1 i uii i r 1 1 1 1 1 1 1 1 1 m 1 1 1 1 ill in 1 1 1 1 1 1 1 1 1 linn 

,, _ AAGGAAGAGGAAGAGTTGAATGTTCACAGATTTCGCAGGTATGCCCAGGAAAAACTAGCT 1647 
#2k EEEKLNVHRFRRYAQEKLA 513 



FMRRLSPDHKRLLOSQVTIM 441 

T TT AACCGGGAGTTGTCTCCTG ATCACAAAAGGTT ACTC GACTCTCAGGT AACGATCAAC 1431 
11(11 II III III MINI I I I I I II 

TTTAA TGGTCAGGAAA CTCAGGCAGCGATAAAT 1680 

PH-----CQET----QAAIN 524 



HKHK----G 3RKRRLE3RSP 457 

CATGAACATGAA GGT- AGTCGTAAACGACGTCTAGAATCTAGATCTCCT 1479 

I I I Mill Ml II Mlllllllllll II I I II I I II I I I I I I I 

C ATGAACATGAACTTAAGATGAGGGAGAGTCGTAAACGACATCATGAATCTAGATCTCCT 1740 

HBHELKHRBSRKRHBB3RSP 544 



1 S IBEQPQ7LKRMYG7Q 473 

AGTATAGAAGAGCAACCACAGTTCTTGAAAAGAATGTATGGTTTCCAG 1527 
II Hill II II HI) I III 

#2s IDEQSQKKXCLSFFV 560 

1996). In DBD/OD1, six amino acids whose replacement by 
another residue has been shown to abolish DNA-binding ac- 
tivity of the DSX protein (Erdman and Burtis 1993) are 
conserved in the DSX homologues from M. scalaris, 
D. melanogaster, B. tryoni, and B. mori (Figs. 4A and 7, 
asterisks). 

There are two highly conserved segments in the C- 
terminal region. The first one, ranging from M. scalaris 
DSX amino acid positions 179 through 216 (Fig. 4 A, arrow- 
heads) is functionally undefined yet. It contains a proline- 
and serine-rich region, but conservation is not restricted to 
these amino acids. The second segment corresponds to the 
oligomerization domain 2 (OD2) of D. melanogaster DSX 
(An et al. 1996). It consists of a part that is common to 
DSX f and DSX m (Fig. 4A, OD2, common part) and a female- 
specific part (Fig. 4B, OD2, ? -specific part). 

It is interesting to note that even the stop signal is con- 
served; the coding sequence of dsjf in all three fly species is 
terminated by TGATAA, the two stop codons opal and ochre 
in succession. There is no information on the stop signal in 
B. mori, as the dbEST sequence is truncated at the 3' end. 

In contrast to the female-specific part of DSX f , the male- 
specific part of DSX m shows rather low similarity with the 
corresponding parts of DSX in D. melanogaster and 
B. tryoni (Fig. 4C). They contribute to the male-specific 
OD2 in D. melanogaster and B. tryoni but it is unclear how 
much if any part at all of the male-specific part of DSX m 
contributes to the OD2 domain in M. scalaris (Fig. 4C, OD2 ?, 
rf-specific part). 

A common property of the male-specific parts of the three 
DSX m homologues is the greater length compared with that 
of the female-specific parts of DSX f ; the latter ones consist 
of 30 amino acids while the male-specific part consists of 
293 amino acids in M. scalaris, 152 amino acids in 
D. melanogaster, and 109 amino acids in B. tryoni. The con- 
spicuous length in M. scalaris DSX m is accounted for by 
two copies of a direct repeat (Fig. IB; repeat 1 and repeat 2, 
Fig. 5). The first copy spans 91 amino acids from position 
383-473 of M scalaris DSX m , the second one 87 amino ac- 



ids, from position 474-560. The two copies are 50% identi- 
cal and 65% similar at the amino acid level. The similarity is 
also apparent at the nucleotide level (63% identity). 

Intron 3 

The inclusion or exclusion of the female-specific exon in 
D. melanogaster dsx depends on the recognition of a weak 3' 
splice site with a purine-rich polypyrimidine tract in the pre- 
ceding intron (intron 3; for convenience, we use the same 
term for that intron in M. scalaris). To retrieve the intron 3 
sequence of M. scalaris, we performed a nested PCR on 
genomic DNA with primers DSXK788F/DSXf562R in the 
first and with DSXK788F/DSXFbr in the second round. A 
628-bp fragment of the genomic M. scalaris dsx sequence was 
amplified and cloned in pMSW2473 (Acc. No. AF283697). 
The fragment contains a short intron of 52 bp (position 100— 
151 inpMSW2473, Fig. 6). 

The general structure of intron 3 in M scalaris is similar 
to that in D. melanogaster, D. virilis (Burtis and Baker 
1989), and B. tryoni (Shearman and Frommer 1998). The 5' 
splice sequence is compatible with the 5' splice sequences of 
D. melanogaster, compiled by Mount et al. (1992). The 3' 
splice site, however, appears to be a suboptimal splice ac- 
ceptor, as only 6 of 12 positions in the polypyrimidine 
stretch are pyrimidines in M scalaris (Fig. 6). Similarly, in 
D. melanogaster 6 of 12, in D. virilis 7 of 12, and in 
B. tryoni 5 of 12 positions are pyrimidines (Burtis and Baker 
1989; Shearman and Frommer 1998). 

Regulatory elements in the female-specific exon 

In D. melanogaster dsx, female-specific splicing at the 
weak 3' splice site of intron 3 is activated by a cw-acting 
splicing enhancer (dsxRE) and sex-specific fra/w-acting fac- 
tors (for review, Lopez 1998). The dsxRE is located within 
the untranslated part of the female-specific exon and consists 
of "13-nt repeat elements" (Burtis and Baker 1989; Inoue et 
al. 1992) and a "purine-rich element" (Lynch and Maniatis 
1995). In an alignment of the respective dsx segments from 
M. scalaris and D. melanogaster, we found little nucleotide 
sequence conservation (data not shown). Nevertheless, 
M. scalaris dsx contains six elements displaying a 10-13 nu- 
cleotide (nt) identity with the 13-nt repeat elements (Ta- 
ble 1). Their distance of 247-602 bp downstream from the 
alternative splice site, is similar to that in D. melanogaster 
(295-566 bp, Burtis and Baker 1989), D. virilis (332-458 
bp, Hertel et al. 1996), and B. tryoni (373-525 bp, Shearman 
and Frommer 1998). 

We found a purine-rich sequence in the female-specific 
segment of M scalaris dsyf 412 bp downstream of the alter- 
native splice site. It consists of 20 nts, 18 of which are pur- 
ines (position 1310-1329 in pMSWcl78). Purine-rich 
sequences were found at similar positions in 
D. melanogaster (Lynch and Maniatis 1995), D. virilis 
(Hertel et al. 1996), and B. tryoni (Shearman and Frommer 
1998). We did not find the direct repeat identified in the 
purine-rich element of D. melanogaster, but this repeat is not 
conserved in B. tryoni or in D. virilis. 

Molecular phytogeny of the DBD/OD1 domain 

A tblastn search in the nr database with the zinc-finger- 
like DNA-binding domain (DBD/OD1) of M. scalaris DSX 
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Fig. 6. The Megaselia scalaris dsx intron between the 5' common segment and the female-specific exon, compared with the corre- 
sponding intron 3 of Drosophila melanogaster, D. virilis (Burtis and Baker 1989), and Bactrocera tryoni dsx (Acc. No. AF040077, 
Shearman and Frommer 1998). Pyrimidines in the polypyrimidine tract (grey box) are presented in bold letters. 



M. scalaris 
D. melanogaster _ 
D. virilis 
8. tryoni 



5' splice site intron 3 

5' common j 
segment ^ 

ATAAGCGAAG| gtagtatgaaacttattaaattttttcttcaaatt< 



3* splice site 

|^ female-specific 



segment 



caaatccttcadGACAATATGC • 



•ATCGAAGAGGj gtaagt • 
Igtaagc • 



•ATTGAGGAAG gtaagt ■ 



92 bp 
88 bp 
108 bp 



atctgatctaaaccag GCCAATACGT • 




ttag |GCCAGTATGT ■ 



ttag GCCAGTATGT • 



Fig. 7. DNA-binding domains homologous to the DBD/OD1 domain of Megaselia scalaris DSX. (A) Alignment of the sequences ar- 
ranged according to increasing distance from M. scalaris DSX. Black boxes, amino acids identical to the M. scalaris sequence; shaded 
boxes, amino acids similar to the M scalaris sequence. Accession numbers: DSX, Drosophila (M25292); DSX, Bactrocera 
(AF029675); DSX, Bombyx (AV398350); DMO, Oreochromis (AF203490); DMRT1, Oreochromis (AF203489); DMRT1, Gallus 
(AF123456); DMRT1, Mus (NM_015826); DMRT1, Sus (AF216651); DMRT1, Homo (AJ276801); DMRT4, Tetraodon (AJ251456); 
DMRT2, Homo (Y19052); TERRA Danio (AF080622); CAA21612, Caenorhabditis (AL032637); AAF48261, Drosophila (AE003492); 
AAF55843, Drosophila (AE003733). (B) Distance tree. The tree was constructed according to the neighbor-joining method of Saitou 
and Nei (1987). 



A 



DSX, Megaselia 
DSX, Drosophila 
DSX, Bactrocera 
DSX, Bombyx 
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DMRTl,Sus 
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DMRT4, Tetraodon 
DMRT1, Oreochromis 
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CAA21612, Caenorhab. 




B 



DMRT4, Tetraodon 

AAF55843, Drosophilo 

CAA21 61 2, Caenorhabditis 



DSX, Bombyx 

DSX, Megaselia 

• DSX, Drosophila 

"I DSX, Bactrocera 



— DMO, Oreochromis 

• DMRT1, Oreochromis 

-I j DMRT1, Gallus 

1 L DMRTl, Mus 

|j DMRTl, Homo 
» DMRTl, Sus 
AAF48261, Drosophilo 



r- DMRT2, Homo 
™L- TERRA, Danio 



0.1 



returned a series of protein entries with a similar domain. 
For a phylogenetic comparison with M. scalaris DSX, we 
selected the 14 entries with highest scores, discarding redun- 
dancies and sequences with incomplete domains, plus 
B. mori DSX, which had been found in a different database 



(see above). The amino acid sequence alignment in Fig. 7A 
shows the DNA-binding domain to be well conserved among 
these proteins. The dendrogram in Fig. 7B displays the dis- 
tances between the DBD/OD1 sequences. D. melanogaster 
and vertebrates contribute more than one homologous gene 
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Fig. 8. Sex-determining pathways in Drosophila melanogaster 
and Megaselia scalaris. 
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to the selected group. They form clusters of paralogous se- 
quences. The four DSX sequences are closely related and 
form a distinct group. The branch points in this group reflect 
the phylogenetic relationship among the species as derived 
from their taxonomic assignment. Drosophila and 
Bactrocera belong to the Schizophora, and Megaselia to the 
Aschiza among flies. The silk moth Bombyx mori is a repre- 
sentative of a different insect order, Lepidoptera. All other 
proteins have less similarity to this group and they lack the 
OD2 domain of DSX (not shown). 

Discussion 

Orthologues of dsx 

Transcripts of M. scalar is dsx come in two variants: a fe- 
male-specific form, dsjf, and a male-specific form, dsx"*, like 
those from D. melanogaster and B. tryoni (Burtis and Baker 
1989; Shearman and Frommer 1998). The composition of 
the dsx variants, however, differs to some degree. In 
D. melanogaster and B. tryoni, dsif, and dsx" 1 each consist of 
a 5' common segment and a 3' sex-specific segment. In 
M. scalaris, dsxf consists of a common 5', a female-specific, 
and a common 3' segment, whereas dsx™ consists of only the 
5' and the 3' common segments. This results in longer dsx? 
than dsx™ transcripts. 

Despite the different compositions of the transcripts, the 
putative proteins DSX f and DSX m of M. scalaris are com- 
posed similarly to those of D. melanogaster and B. tryoni. 
They consist of a common N-terminal part and a sex-specific 
C-terminal part. This is achieved in dsx? of M. scalaris by a 
translation stop in the female-specific segment while in dsx™ 
translation stops in the 3' common segment. 

Sequence conservation is high in the two functional do- 
mains DBD/OD1 and OD2 but is not restricted to these re- 
gions. DBD/OD1 is a well-conserved domain in a wide 
range of metazoan species and there are several paralogues 
apparent in vertebrates and D. melanogaster. Molecular phy- 
logeny based on this domain shows M scalaris dsx to be 
most closely related to D. melanogaster, B. tryoni, and 
B. mori dsx, indicating that these are orthologous genes. 



Sex-specific splicing 

In intron 3, we find a poor splice acceptor with a purine- 
rich polypyrimidine stretch, similar to that present in 
D. melanogaster Some 250-600 bp downstream in the 
female-specific segment a purine-rich element and 13-nt 
repeat elements are conserved. Comparable elements in 
D. melanogaster form eft-acting splicing enhancer dsxRE, 
to which the a trans-acting multiprotein complex binds. It 
consists of TRA2, RBP1, and the female-specific protein 
TRA, and activates splicing at the weak female-specific 3' 
splice site (for review, Lopez 1998). The distance of the 
enhancer elements from the alternative splice site is critical 
for their function in D. melanogaster When dsxRE was ex- 
perimentally moved to less than 150 bp from intron 3, the 
splicing enhancer function was constitutive and independent 
of the presence of the TRA2/RBP1/TRA complex (Tian and 
Maniatis 1994). The distance in M. scalaris is appropriate 
for a corresponding control mechanism based on a /ra-like 
step in the sex-determining cascade. 

Functional conservation of dsx 

Drosophila DSX is responsible for transcriptional activa- 
tion (DSX*) or suppression (DSX m ) of yolk protein genes, 
yp-1 and yp-2, in the fat body (for review, Bownes 1994 and 
references therein; An and Wensink 1995a, 19956) but is 
suspected to regulate sex-specific transcription of more 
genes (Schutt and Ndthiger 2000). Drosophila DSX is 
known to bind to DNA in the form of a homodimer (An et 
al. 1996; Erdman et al. 1996). The strong conservation of 
the DNA-binding and oligomerization domains DBD/OD1 
and OD2 indicates conservation of these essential functions 
in M. Scalaris DSX f . A conserved region in front of OD2 is 
rich in proline and serine residues, but no functional signifi- 
cance of this region is yet known. Some authors suggest that 
this region mediates transcriptional regulation and (or) 
protein-protein interaction (Raymond et al. 1999; Yi and 
Zarkower 1999). 

Megaselia scalaris DSX m contains the same conserved re- 
gions as DSX f with one exception: the sex-specific part of 
OD2. Due to the overall low degree of conservation in the 
male-specific part of DSX m , we can draw no conclusion re- 
garding the extension of OD2 in that protein. In 
D. melanogaster, DSX f and DSX m bind to the same site 
within the promoter region of the yolk protein genes. While 
DSX f is activating, DSX m is repressing transcription. The 
conspicuous size of the male-specific part of DSX m in 
M. scalaris may help to fulfill that function by sterically 
obstructing the binding of activators for the yolk protein 
gene transcription to the promoter, as suggested for 
D. melanogaster DSX m by An and Wensink (1995a) and 
Cho and Wensink (1998). 

Conservation of the sex-determining mechanism 

Results presented in this paper confirm and extend an ear- 
lier report of our group on M. scalaris dsx (Sievert et al. 
1997). From these, a picture of a part of the sex-determining 
cascade in M. scalaris emerges (Fig. 8). Presence or absence 
of the epistatic (and transposable) Maleness factor is the pri- 
mary sex-determining signal (Traut and Willhoeft 1990; 
Traut 1994). It exerts its control on an unknown gene in the 
cascade. Sxl, which mediates the sex-determining signal in 



© 2000 NRC Canada 



Kuhn et al. 



1019 



D. melanogaster (for review, Schutt and Nothiger 2000), is 
not part of M. scalaris sex-determining cascade {Megsxl, 
Fig. 8, Sievert et al. 1997; Sievert et al. 2000). The next step 
in the sex-determining cascade of Drosophila, transformer 
(tra), has not yet been isolated in M scalaris, but the presence 
of binding sites for the splice-activating TRA2/RBP1/TRA 
complex hints at its presence. The next step in the cascade, 
dsx, is conserved in M. scalaris. All but one of the structural 
details considered essential for its proper function as a trans- 
mitter of the sex-determining signal are conserved. The one 
exception is the male-specific component of OD2. 

It is obvious that, while primary and secondary sex- 
determining steps are not conserved, subsequent steps are 
conserved among flies even when they belong to such 
distantly related groups as Schizophora (Drosophila, 
Bactrocera) and Aschiza (Megaselia). The conservation of 
functional domains in dsx of the silk moth, Bombyx mori, in- 
dicates that this step in the sex-determining pathway is con- 
served in an even wider range of different insect orders. 

It is not clear yet how much of this pathway is conserved 
in animal groups other than insects. There are intriguing ob- 
servations from nematodes and vertebrates. Drosophila dsx™ 
rescues mab-3 mutants in the nematode C. elegans (Ray- 
mond et al. 1998). The vertebrate gene DmrtllDMRTl is ex- 
pressed in the genital ridge of embryos and in testes of 
adults and probably plays a role in sexual development of 
vertebrates (Raymond et al. 1998; Raymond et al. 1999; 
Smith et al. 1999; De Grandi et al. 2000; Guan et al. 2000). 
Both, DMRT1 and MAB-3, are proteins containing the same 
type of zinc-finger-like DNA-binding domain as DSX. How- 
ever, DMRT1 as well as MAB-3 lack the OD2 domain that 
is characteristic for DSX, and there are other genes in 
C. elegans with higher similarity to DSX (see Fig. 7B). It is 
obvious that this type of DNA-binding protein plays a wide 
role in the regulation of sexual development. However, it is 
not clear whether they play a key part in sex determination, 
as in the role of dsx in insects. 
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ABSTRACT 

We have compared the RNA sequences and secondary structures of the Drosophila melanogaster and 
Drosophila vlrilis doublesex (dsx) splicing enhancers. The sequences of the two splicing enhancers are highly 
divergent excepHor the presence of nearly identical 13-nt repeat elements (six in D. melanogaster and four 
in D. virills) and a stretch of nucleotides at the 5' and 3 r ends of the enhancers. In vitro RNA structure probing 
of the two enhancers revealed that the 13-nt repeats are predominantly single-stranded. Thus, both the pri- 
mary sequences and single-stranded nature of the repeats are conserved between the two species. The sig- 
nificance of the primary sequence conservation was demonstrated by showing that the two enhancers are 
functionally interchangeable in Tra-/Tra2-dependent in vitro splicing. In addition, inhibition of splicing enhancer 
activity by antisense oligonucleotides complementary to the repeats demonstrated the importance of the con- 
served single-stranded structure of the repeats. In vitro binding studies revealed that Tra2 interacts with each 
of the D. melanogaster repeat elements, except for repeat 2, with affinities that are indistinguishable, whereas 
Tra binds nonspecifically to the enhancer. Taken together, these observations indicate that the organization 
of sequences within the dsx splicing enhancers of D. melanogaster and D. vlrilis results in a structure in which 
each of the repeat elements is single-stranded and therefore accessible for specific recognition by the RNA- 
binding domain of Tra2. 

Keywords: phylogeny; regulated splicing; RNA/protein interactions; RNA structure 



INTRODUCTION 

Sex-specific alternative splicing of the Drosophila mela- 
nogaster doublesex (dsx) pre-mRNA requires the regu- 
latory proteins Transformer (Tra) and Transformer 2 
(Tra2), and a Tra- and Tra2-dependent splicing en- 
hancer (the doublesex repeat element dsxRE) that is lo- 
cated 300-nt downstream of the female-specific 3' splice 
site (for review see Baker, 1989; Maniatis, 1991). Tra 
is produced exclusively in females by the sex-specific 
splicing of Tra pre-mRNA, whereas Tra2 is expressed 
in both males and females (Boggs et al., 1987; Amrein 
et al., 1988). As shown in Figure 1, dsx pre-mRNA con- 
tains six exons: three common exons (exons 1-3), a 
female-specific exon (exon 4), and two male-specific ex- 
ons (exons 5 and 6). In males, exon 3 is joined to exon 5 
to produce an mRNA containing exons 1, 2, 3, 5, and 6. 
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In females, exon 3 is joined to exon 4 to produce an 
mRNA containing exons 1, 2, 3, and 4 (Burtis & Baker, 
1989), The female-specific 3' splice site in intron 3 de- 
viates significantly from the consensus 3' splice site, so 
it is not recognized by the splicing machinery in the ab- 
sence of Tra and Tra2, or in the absence of the dsxRE 
(Burtis & Baker, 1989; Hedley & Maniatis, 1991; Hoshi- 
jima et al., 1991; Ryner & Baker, 1991; Tian & Maniatis, 
1992, 1993; Zuo & Maniatis, 1996). 

The ds*RE is a 270-nt regulatory element that con- 
tains six 13-nt repeat sequences, and a purine-rich ele- 
ment (PRE) located between repeats 5 and 6 (Burtis & 
Baker, 1989; Lynch & Maniatis, 1995). The presence of 
both types of elements is required for efficient Tra/Tra2- 
dependent use of the female-specific 3' splice site. Tra 
and Tra2 bind to the dsxRE and facilitate the recruit- 
ment of splicing factors to the weak female-specific 3' 
splice site (Hedley & Maniatis, 1991; Inoue et al., 1992; 
Tian & Maniatis, 1992, 1993; Zuo & Maniatis, 1996). 
Thus, in females, the splicing machinery is preferen- 
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FIGURE 1. Sex-specific alternative splicing pattern of dsx pre-mRNA and the dsx splicing enhancer (rfs*RE). Top: Open 
boxes represent the common exons 1, 2, and 3, the light hatched box represents the female-specific exon 4, and the dark 
hatched boxes represent the male-specific exons 5 and 6. The sex-specific splicing pattern is illustrated by the lines above 
(female) and below (male) the pre-mRNA. Sites of cleavage and polyadenylation are labeled poly A. Bottom: Enlargement 
shows the organization of the 0. melanogasler dsxKE comprised of six 13-nt repeat elements (boxes 1-6) and a PRE located 
between repeats 5 and 6. 



tially directed to an intrinsically weak splice site recog- 
nition signal 

In vitro splicing experiments have shown that, in 
addition to Tra and Tra2, one or more members of the 
SR (serine/arginine) family of general splicing factors 
are required for dsxRE-dependent recognition of the 
female-specific 3' splice site (Tian & Maniatis, 1993). 
UV-crosslinking experiments and the characterization 
of affinity-purified enhancer complexes have shown 
that Tra and Tra2 recruit SR proteins to the dsxRE to 
form a multicomponent splicing enhancer complex 
(Tian & Maniatis, 1992, 1993). Both Tra and Tra2 con- 
tain SR domains, and are therefore considered mem- 
bers of the super/family of SR-containing splicing 
factors (for review see Fu, 1995). Although Tra2 and SR 
proteins contain an RNA recognition motif (RRM) 
(Bandziulis et al., 1989), Tra is lacking a known RNA- 
binding domain. Binding studies in HeLa cell nuclear 
extracts or with purified recombinant Tra, Tra2, and SR 
proteins revealed that Tra2 binds with significant spec- 
ificity to the dsxRE, whereas Tra exhibits low or no 
specificity (Hedley & Maniatis, 1991; Inoue et al., 1992; 
Tian & Maniatis, 1992; Lynch & Maniatis, 1995, 1996). 
In addition, these studies showed that Tra, Tra2, and 
SR proteins bind cooperatively to the dsxRE. Recent 
UV-aosslinking experiments have shown that Tra and 
Tra2 recruit a specific SR protein to the dsx repeats 
(Lynch & Maniatis, 19%). This SR protein binds to the 
5' ends of the repeats, whereas Tra2 binds to the mid- 
dle and 3' regions. Tra is an essential component of this 
heterotrimeric complex, but does not appear to directly 
contact RNA. 



These observations, in conjunction with studies 
showing that these proteins interact with each other 
through their SR domains ( Wu & Maniatis, 1993; Am- 
rein et al., 1994), suggest that a complex involving 
multiple specific protein-protein and protein-RNA in- 
teractions is assembled on the dsxRE. 

Tra2 and SR proteins contact RNA through their 
RRMs, but relatively little is known about the structure 
of the binding sites in the dsxRE. Recently, the three- 
dimensional structure of a complex between the RNA- 
binding protein Ul A and its binding site in Ul snRNA 
was determined (Oubridge et al., 1994). U1A, which 
contains an RRM similar to that present in SR proteins 
and Tra2, specifically interacts with the single-stranded 
region of a hairpin structure formed by Ul snRNA. The 
single-stranded region provides a surface for extensive 
interactions between the protein and exposed nucleo- 
tides. These results and other studies suggest that 
interactions with single-stranded RNA might be a gen- 
eral mode of recognition for the RRM domain/RNA 
complex (Nagai et al., 1995). 

Although the dsxRE is the only regulated splicing en- 
hancer thus far characterized, a number of constitutive 
splicing enhancers have been described (Sun et al., 
1993; Watakabe et al., 1993; Dominski & Kole, 1994; 
von Oers et al., 1994; Ramchatesingh et al., 1995). Most 
of these elements are short (approximately 10 nt) 
purine-rich sequences, but a few are pyrimidine-rich. 
None thus far reported resemble the dsx repeat se- 
quences. Both types of enhancers require SR proteins 
for their activities, but only the dsx RE requires Tra and 
Tra2. The primary difference between constitutive en- 
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hancers and the dsxRE is that the former can function 
only within 100 nt of the affected 3' splice site, whereas 
the latter can function at least 1,000 nt away (Tian & 
Maniatis, 1994). This ability to function at a distance 
may be due, in part, to the complex organization of the 
dsxRE, and to Tra and Tra2, which may function to 
promote enhancer complex assembly and stability. 
Consistent with both of these possibilities is the obser- 
vation that individual dsx repeats or the PRE function 
as constitutive (Tra- and Tra2-independent) splicing 
enhancers in vitro when located within 100 nt of the 
female-specific 3' splice site (Lynch & Maniatis, 1995). 
Although individual repeats can function as Tra- and 
Tra2-dependent enhancers at a distance (Hoshijima 
et al., 1991), maximal splicing efficiency in vitro re- 
quires the combination of multiple repeats, the PRE 
and Tra and Tra2 (Tian & Maniatis, 1994; Lynch & 
Maniatis, 1995). 

The unique organization of the dsxRE suggests the 
interesting possibility that this arrangement of se- 
quences results in the formation of a specific second- 
ary and tertiary structure that is required for optimal 
Tra- and Tra2-deperident splicing. To investigate this 
possibility, we have compared the sequence of the D. 
melanogaster dsxRE with the corresponding sequence 
from the distantly related D. virilis. In addition, we 
have conducted chemical and enzymatic RN A probing 
experiments to investigate the secondary structure of 
the (fcxREs from both species. This structural informa- 
tion was then used to optimize computer-assisted 
RNA-folding predictions. These studies revealed sig- 
nificant phylogenetic conservation of the primary se- 
quence of the repeat elements, and the secondary 
structure of the complete element. 

We also conducted RNase footprinting experiments 
with purified Tra and Tra2 proteins and both dsxREs 
to investigate specific protein-RNA interactions in the 
complex. Based on these assays, we found that Tra2 
binds to each of the repeat elements with affinities that 
are indistinguishable, whereas Tra binds nonspecifi- 
cally to the rfsxRE. We conclude that the organization 
of the dsxRE results in the formation of an RNA sec- 
ondary structure in which each of the repeats is present 
as single-stranded RNA that is recognized specifically 
by the RRMs of both Tra2 and SR proteins. 

RESULTS 

Comparison of the D. melanogaster dsxRE 
and the corresponding region in the 
D. virilis dsx pre-mRNA 

The female-specific fourth exon of D. virilis was iden- 
tified from a genomic DNA library (Newfeld et al., 
1991) using the PCR-amplified third intron of D. virilis 
as the hybridization probe (intron sequence from Bur- 
ns & Baker, 1989). The DNA sequence of this exon was 



determined and compared to the corresponding region 
of the D. melanogaster dsx gene. An alignment of the 
two sequences revealed several highly conserved re- 
gions between the two species (Fig. 2). The first 100 nt 
of the fourth exon are 90% identical between the two 
species, encoding only 1 different amino acid of the 
30 translated amino acids (98% homology on the amino 
acid level). Because this region contains the coding se- 
quence for the carboxy terminus of the female-spedfic 
dsx protein, this high degree of sequence conservation 
is not unexpected. Previously, an interspecific nucle- 
otide sequence comparison of the Drosophila hsp82 
gene demonstrated that the coding regions of the dis- 
tantly related D. melanogaster and D. virilis species are 
90% homologous at the DNA level and 97-99% iden- 
tical at the amino acid level (Blackman & Meselson, 
1986). In contrast, little or no sequence conservation 
was observed in the intron or the nontranslated exon 1 
sequences of hsp82. Consistent with this observation, 
the conservation of noncoding sequences of exon 4 in 
dsx is weak, except at the 5' and 3' ends of the dsxRE 
and the repeat sequences. As shown in Figure 2, there 
are two regions of approximately 30 nt at the 5' and 
3' ends of the dsxRE that are highly conserved. Al- 
though these regions in the D. melanogaster dsxRE are 
not required for maximal levels of Tra- and Tra2- 
dependent splicing in vitro (K.W. Lynch & T. Mania- 
tis, unpubl.), they may be required for a regulatory 
function in vivo. 

By introducing gaps into both dsxRE sequences, it is 
possible to align the 13-nt repeat sequences of the 
D. melanogaster dsxRE with nearly identical sequences 
in the D. virilis exon 4. In agreement with a recent 
study (Heinrichs & Baker, 1995), six repeat elements 
are present in the D. melanogaster dsxRE, whereas only; 
four such repeats were found in the enhancer region 
of D. virilis. In addition, the nucleotide composition of 
the 13-nt repeat elements varies slightly in D. melano- 
gaster (Burtis & Baker, 1989), whereas all of the repeats 
in D. virilis are identical to each other and to the pre- 
dominant repeat sequence in D. melanogaster. Thus, the 
repeat sequences are highly conserved between the 
two species, but the surrounding sequences are nearly 
random, suggesting that specific sequences are not re- 
quired for dsxRE function outside of the repeats. 

The D. virilis dsxRE does not contain a sequence that 
is identical to the D. melanogaster PRE. However, there 
is a purine-rich sequence located immediately down- 
stream of the fourth repeat in the D. virilis dsxRE. The 
factors that bind to this purine-rich sequence are simi- 
lar to the factors associated with the D. melanogaster 
PRE, and both purine-rich sequences are sufficient to 
act as a constitutive enhancer when in close proximity 
to the weak 3' splice site (K.W. Lynch & T. Maniatis, 
unpubl.). Thus, the function, but not the exact se- 
quence, of the PRE may be conserved between the two 
species. 



972 



K./. Hertel et ah 



D.v. 1 
D.ro. 1 



gccagtatgttgtcaatgaatactcccgtcaaaacaatttaaatatctat 

mi ii iiiii inn illinium i urn mmm 

SCCAATACGTTGTGAATGAGTACreCCGTCAACATAATTTGAATATCTAT 



50 
50 




D.v. 101 ctattaatcaaactaaatacgaaaattatatatatatatataaannnntg 150 

i iiiii i i m i i i i i im n --i 

D.m. 101 C . ATTAACTAGAGTAACGAATACTACTTTGCCCCGATATTTATTATTGTT 149 



D.V. 



D.m. 



151 cctagttatctat ttattgcttccgtcaacttcaacttaaaacaaa 196 

i i i iii iii mm ii ii i 

150 CAGCATCACATATTAGCTTAATGCTTCGGTGAAAT CG 186 



D.v. 197 accatatttaa. . tattcagcttaaagcttaaaaccgcgcgcgagttgca 244 

i mm i i ii mi n i i i i-ii i _ 

D.m. 187 CGCGAATTTAACTTTTATAACTTAGAGTT GAGTAACTTAGA 227 



gtaaaataaaaaaaaaaaaaags aacctctgtaaatataccccgaattta 



D.v. 245 _ 

II II I II 
D.m. 228 GTTTTATGGAGCAA . 



1 1 1 1 1 1 1 1 1 1 1 1 1 i mmm 

. AACCTCTGTAAATAAA . . TCGAATTTA 



repeat 1 

D.v. 295 t . tgtaagcnngcaactttcaactatcacttggaaatajtctccaatcaac 

i mi i» n mm i i mm" 

D.m. 267 TCGGTAAACTA AAGCGCGACTTGG - ACTArrCTTCAATCAACl 



repeat 1 



D.v. 344 
D.m. 307 

D.v. 379 



alaaccaaat acacacg. . caacaaaacagcaqtcctc 

in mm m n i it mi mm 

ilAGCCAAATATGTCGATGTGTGACAGICCGTTCTACGCGTCAGCTTl2^|£2£ 



repeat 2 

aatcaacaltacc ctggcattgatgt . 

1 1 1 1 1 1 TIT i nm i mi 



repeat 2 



294 
266 

343 
306 

378 
356 

403 



D.m. 357 lAATCAACAh TACCCCGTGCTGAGATGTCTGGCCTCAATGTTAATAATCTC 406 



D.V. 404 



D.m. 



D.v. 



e at 3 

tcaacataccctataaa 

nun 1 1 ii i mi 

rCTTCAATCAACa ATCCGCAAA . 

repeat 4 



444 



gcgacactctccgaaaq 

I 1 1 M il I 

407 ^ ahv^ArAATTAACAh 'TCTC. . .TTdTCTTC AATCAACAl ATCCGCAAA. . 451 

repeat 4 

445 gatgatgatctadtcttcaatcaacaitacccgcagagagcaacacgcaac 494 

mm n imimm n 1 mi n 

D.m. 452 CGGATCTA ATCAACAATCAACAff AGCCCTTGCCGCCAACGAATAAT 497 

repeat 5 

D.v. 495 gaaatctcttgtcaaatctga ttgttaaaagctgtt 530 

mi i mi i ii mm i n n 

D.m. 498 ATAATCAAAGGACAAAGGACAAAATGTAGTAAACGTTTTTAAAAAATATT 547 



FIGURE 2. Sequence alignment of the 
female-specific fourth exon from D. mel- 
anogaster and the distantly related D. 
virilis. Shown are the first 650 nt of the 
fourth exon, including the dsxRE. Verti- 
cal bars indicate identical nucleotides. 
Alignment was accomplished by using 
the Sequence Analysis Software Package, 
version 7.2, from the Genetics Computer 
Group. The 13-nt repeat elements iden- 
tified in each species are labeled and 
boxed. Other boxed regions are regions 
of high sequence homology. 
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Comparison of the 0. melanogaster and D. virilis 
splicing enhancer activities in HeLa eel! 
nuclear extracts 

To investigate the biological significance of the se- 
quence conservation between the D. melanogaster and 



D. virilis dsxREs, we compared their activities in an in 
vitro splicing assay. To accomplish this, a pre-mRNA 
substrate (D2) was constructed in which the dsxRE of 
D. melanogaster was replaced by the enhancer region of 
D. virilis containing only four repeats. Consistent with 
earlier studies, splicing of substrate Dl was observed 
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only in the presence of Tra and Tra2 (Fig. 3). Similarly, 
the in vitro splicing activity of the D. virilis dsxRE re- 
quired D. melanogaster Tra and Tra2, and the concen- 
trations of these proteins required for maximal splicing 
efficiency are indistinguishable from the concentrations 
required for substrate Dl. Thus, the dsxRE from both 
species can substitute functionally for each other in 
in vitro splicing assays. 

The structural analysis of the dsx enhancer described 
below was conducted with RNAs containing portions 
of the fourth exon, but lacking the adjacent intron. It 
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FIGURE 3. dsxRE from D. virilis can functionally substitute for the 
dsxRE in D. melanogaster. A: Both oligonucleotides Dl and D2 share 
part of exon 3 and the entire regulated intron derived from D. mela- 
nogaster. Substrate D2 contains the D. virilis dsxRE in place of the 
D. melanogaster dsxRE. B: Splicing efficiencies of Dl and D2 are com- 
pared as a function of Tra/Tra2 concentration. C: Quantitation of the 
data in B. At the Tra/Tra2 concentrations used, the splicing efficiency 
of Dl (open circles) is indistinguishable from the splicing efficiency 
observed for D2 (dosed circles). 



is therefore important to demonstrate that the isolated 
dsxRE can form a specific regulatory complex. In fact, 
several lines of evidence show that dsxRE RNA frag- 
ments are capable of specifically binding to or com- 
peting for Tra/Tra2. For example, an isolated dsxRE 
containing all six repeat elements can specifically in- 
hibit the splicing of Dl (Tian & Maniatis, 1993) and it 
interacts specifically with Tra and Tra2 in nuclear ex- 
tracts (Tian & Maniatis, 1992). Similar competition ex- 
periments conducted with isolated enhancer elements 
used here are in agreement with the results of Tian and 
Maniatis (1993). The titration of enhancer competitor 
RNA reduced the splicing efficiency of Dl dramatically, 
whereas the presence of a nonspecific competitor at the 
same concentration had no significant effect (data 
not shown). In addition, the splicing efficiency of the 
Tra/Tra2-independent 0-gIobin pre-mRNA was un- 
affected by either specific or nonspecific competitor 
RNA. Thus, the isolated dsxREs inhibit dsx splicing 
specifically by competing for frans-acting factors that 
are not components of the basic spliceosome, but are 
essential for the Tra/Tra2-dependent recruitment of the 
spliceosome to the dsx pre-mRNA. It is therefore rea- 
sonable to argue that enhancer elements in the absence 
of any splice sites are capable of binding to or compet- 
ing for regulatory factors required for female-specific 
splicing of dsx pre-mRNA. 

Secondary structure analysis of the 
0. melanogaster and 0. vlrtlls dsxREs 

Direct enzymatic and chemical in vitro structure prob- 
ing was used to identify regions within the enhancer 
elements that are single stranded or involved in sec- 
ondary or higher-order interactions (Knapp, 1989; Krol 
& Carbon, 1989). The digestion pattern of the D. mela- 
nogaster dsxRE RNA revealed that the 13-nt repeat 
elements are predominantly in a single-stranded con- 
figuration (Fig. 4). Our analysis also identified several 
base paired regions with different sensitivities to RNase 
VI, an enzyme that specifically recognizes nucleotides 
involved in base pairing. Similarly, absence of RNase 
71 digestion is indicative of guanosine residues that are 
not accessible for modification. These nucleotides are 
thought to be in the immediate vicinity of or directly 
involved in higher-order structural arrangements. 

Three different permutations of the D. melanogaster 
enhancer region (Rl-6, R2-5PRE, R2-6) were used to 
address whether the nucleotides 5' or 3' from the repeat 
elements influence the folding pattern. Neither the re- 
moval of the first nor the last repeat element of the 
dsxRE changed the digestion pattern (data not shown). 
Thus, the in vitro folding of the enhancer element does 
not depend on interactions between the 5' and 3' ends 
of the RNA. 

In order to substantiate the secondary structure in- 
formation of the D. melanogaster enhancer region ob- 
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FIGURE 4. In vitro structure probing of the dsxRE. A: RNase diges- 
tion pattern of Rl-6. Lanes: con, no RNase added; VI. RNase VI 
digestion; OH alkaline hydrolysis ladder; Tl (D), RNase Tl diges- 
tion at denaturing conditions; T1(N), RNase Tl digestion at assay 
conditions; A, RNase A digestion. Locations of repeats 1-6 and the 
PRE are indicated on the right side. B: Summary of the various oligo- 
nucleotides used for the dsxRE structure probing. Rl-6 contains all 
repeats present in the dsxRE of D. melanogaster and the PRE; R2-5PRE 
contains repeats 2-5 and the PRE; R2-6 contains repeats 2-6, includ- 
ing the PRE, and Vl-4 contains the rfsxRE from D. virilis containing 
all four repeat elements. 



tained by nuclease digestion, the RNA was treated 
with the single-stranded chemical modifier DMS fol- 
lowed by primer extension (Krol & Carbon, 1989). 
DMS methylates adenine and cytosine residues at the 
N-l and N-3 positions, respectively, with some pref- 
erence for adenine. Residues that interact with other 
nucleotides through N-l- or N-3-mediated hydrogen 



bonding are not accessible for the chemical modifica- 
tion. The results of a series of DMS methylation exper- 
iments are summarized in Figure 5A. As observed in 
the RNase digestion experiments, the 13-nt repeat ele- 
ments are very accessible to chemical modification and 
are therefore in a predominantly single-stranded con- 
formation. By contrast, the PRE appears to be in a pre- 
dominantly base-paired configuration. 

Similar experiments were conducted with the D. 
virilis rfsxRE. As with the D. melanogaster dsxRE, the 
13-nt repeat elements of the D. virilis RNA are predom- 
inantly in single-stranded regions. Only one of the four 
repeats appears to be involved in some secondary 
structure (Fig. 5B). 

Computer-assisted folding of the enhancer region 

The enhancer regions of D. melanogaster and D. virilis 
were folded using the MFOLD and PLOTFOLD appli- 
cation programs (version 7.2) from the Genetics Com- 
puter Group, University of Wisconsin Biotechnology 
Center. With the availability of biochemical structure 
data collected in the RNase and chemical modification 
experiments described above, the folding of several nu- 
cleotides within each RNA was prevented prior to the 
application of the program. Because MFOLD can gen- 
erate and analyze suboptimal structures, a representa- 
tive secondary structure depiction was chosen from a 
series of optimal and suboptimal structures based on 
its agreement with the remaining experimental data. 
Figure 6 illustrates structure representations for the en- 
hancer regions of D. melanogaster and D. virilis that best 
fit the biochemical data. With the exception of repeat 
2 in each enhancer, all of the 13-nt repeats are single 
stranded. In the representations, approximately 40% 
of each RNA is involved in Watson-Crick base pairing. 
This is a relatively low percentage compared to the 
well-defined RNA structures within the ribosomal 
RNAs (Noller, 1984), but similar to the extent observed 
for U2 snRNA in Tetrahymena (Zaug & Cech, 1995). 

Tra/Tra2-dependent in vitro footprinting 
of the enhancer region 

Previous studies have shown that Tra2 can bind spe- 
cifically to the dsxRE (Hedley & Maniatis, 1991) or a 
short oligonucleotide comprised of two repeat ele- 
ments (Inoue et al., 1992), but the site of this interaction 
is not known. Therefore, we conducted in vitro foot- 
printing studies to identify Tra2 binding sites within 
the dsxRE. An initial binding specificity screen was es- 
tablished to evaluate the binding specificity of Tra or 
Tra2 to the enhancer probe. In this screen, a mixture 
of 5' end-labeled oligonucleotides cleaved at a single 
residue by alkaline hydrolysis was mixed with increas- 
ing concentrations of Tra or Tra2, and the bound mol- 
ecules were recovered by retention on nitrocellulose 
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Qc§^CUU§^ ciiuCllliAA iLvCAAGCC AAAUAU@UC§ GciACl@4 ACuiuC^CUEGG^^ 
repeat 1 

tU §U§UGACA @x(j^uijL GCGUC^GCUU uiuuiAAUC^A AACciAAuiAcicGCMCAAA^^AUUCUUCAS3CAA 



repeat 2 



ACAUUACCCC 



AJ GUCl(|||cX)UO^UUAAUA C^lAcCC 



W§U@c(^CACuEub CGAAACUCUU 



AUCUCAAUCU ACAAUCAACA UUCUCUUCUC T UUCAAUCAAC 
repeat 3 repeat 4 



AAUCCGCAAA 



^AAA ^AUCU^AU iUJUuiL CAUA§CCEE1 



ciAUC^Aciu^CCcdkuAM GAU®*uQ*UC uicUCUUckA 
repeat 3 re P eat 4 

uivAciuACC cgcaQaQ^c aacacQcaac GAAAUCUCUU 



repeat 5 



@CC(^CAACG AAiAaiAuAA UCAAA§^ACA 1aA§§AC^AAAI §UCAMUCU§AUU§UUAAAA0DU^®AAACAUciuCAG 



PRE 



• • • • 

IAGUAAAC GUUUUUAAAA AAUAUUCGAA AUACGCACGC 



AUCAACUCAA CC>* 



CC&GCGAAAA CGUGCUAGGA 



CAAAUCAACG AUCAACAUUU CAACACGU 
repeat 6 

FIGURE 5. Summary of the enzymatic and chemical structure probing of the itoRE from (A) D. melatwgnster and from 
(B) D. virilis. Symbols: long arrows, strong RNase A cut; small arrows, weak RNase A cut; boxed nucleotides, moderate 
to strong RNase VI cut; solid dots, strong Tl aits; open ovals, G residues protected from Tl digestion; long vertical bars, 
strong sites of DMS modification; small vertical bars, weak sites of DMS modification. Repeat elements in each dsxRE are 
in bold. 



filters (Gott et a!., 1993). Analysis of the oligonucleo- 
tides recovered from the filters indicated that retention 
by Tra binding is very efficient for all truncated RN As, 
even for those that contain only dsx unrelated poly- 
linker sequences. In contrast, only those RNAs con- 
taining one or more complete 13-nt repeat elements 
were retained on the filter by Tra2 (data not shown). 
Thus, consistent with previous filter-binding and UV- 
crosslinking studies (Lynch & Maniatis, 1995, 1996), 
Tra interacts with RNA with little or no specificity, 
'whereas Tra2 binds specifically to the repeat elements. 

To determine whether Tra2 can protect specific re- 
gions of the dsxKE from RNAse digestion, the R2-5PRE 
was subjected to RNase A digestion in the presence of 
increasing concentrations of Tra2. As shown in Figure 7, 
all of the repeat elements present in R2-5PRE remain 
accessible to RNase A digestion at Tra2 concentrations 
lower than 20 nM. Surprisingly, all of the repeat ele- 
ments except repeat 2 are protected to similar degrees 
at concentrations of >100 nM Tra2. This observation 
correlates well with the observed binding affinity of 
Ka = 50 nM for Tra2 measured under identical condi- 



tions by a nitrocellulose filter-binding assay (data not 
shown). Tine absence of selective binding of Tra2 to in- 
dividual repeats indicates that the 13-nt repeat ele- 
ments represent multiple Tra2 binding sites within the 
enhancer region. These binding sites, except repeat 2, 
are occupied at similar Tra2 concentrations. Similarly, 
Tra2 binds to the PRE at approximately the same con- 
centration. Interestingly, our RNA structural probing 
data (Fig. 4) and computer-assisted folding analysis 
predict that the PRE is primarily in a double-stranded 
configuration. However, with increasing concentra- 
tions of Tra2, part of the PRE sequence becomes more 
susceptible to RNase A cleavage and part of the region 
is protected from this cleavage (Fig. 7). Thus, Tra2 bind- 
ing appears to induce a conformational change in the 
PRE RNA. 

A series of modification interference experiments 
were performed to identify whether specific nucleo- 
tides within the enhancer region are essential for Tra 
or Tra2 binding. The protocol for modification interfer- 
ence requires conditions to achieve a single modifica- 
tion per RNA molecule and relies subsequently on the 
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Repeat 1 




FIGURE 6. Secondary structure models of the (terREs 
from D. melanogastenxnd from O. w'n//s. The structures 
were generated as described in the text. Boxed areas in- 
dicate the position of the 13-nt repeat elements and the 
PRE. Standard Watson-Crick base pairs and C - U pairs 
are indicated by bars. 



separation of free RNA from the bound species by a ni- 
trocellulose filter-binding assay (Conway & Wickens, 
1989). The comparison of RNA molecules selected by 
Tra or Tra2 binding with the initial RNA pool did not 
lead to the identification of nucleotides essential for the 
interaction of Tra or Tra2 with the enhancer (data not 
shown). This data supports the above conclusion that 
the enhancer region contains not one high-affinity site 
for the interaction with Tra2, but several. Although 
Tra2 is capable of binding to the dsxRE in the absence 
of additional proteins, we note that the specificity of 
Tra2 dramatically increases in the presence of SR pro- 
teins (Lynch & Maniatis, 1995) and in nuclear extracts 
(Lynch & Maniatis, 1996) under splicing conditions. 
Thus, the experiments presented here show only that 
each of the repeats can bind to Tra2 specifically. They 
do not address the nature of the RNA-protein complex 
assembled on a functional dsxRE. 



Antlsense inhibition of dsx pre-mRNA splicing 

To determine whether the single-stranded configura- 
tion of the 13-nt repeats in the D. melafwgaster rfsxRE 
is required for enhancer function, we conducted exper- 
iments to determine whether the repeats can function 
in a double-stranded configuration. An antisense oligo- 
nucleotide complementary to the 13-nt repeat was an- 
nealed to the dsxKE and the effects on pre-mRNA 
splicing were examined. As shown in Figure 8, the 
presence of the antisense oligonucleotide has a dra- 
matic effect on the splicing efficiency of substrate D2. 
In the absence of antisense oligonucleotide, approxi- 
mately 25% of the substrate is spliced. In contrast, no 
spliced products are detected above the degradation 
levels regardless of whether the substrate was preincu- 
bated with the antisense oligonucleotide at conditions 
identical to those used in the secondary structure anal- 
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FIGURE 7. RNase protection pattern of R2-5PRE as a function of 
increasing concentrations of Tra2. Lanes: Tl, RNase Tl digestion at 
denaturing conditions; OH", alkaline hydrolysis; con, no RNase 
added. Eight microliters of 5Mabeled R2-5PRE in reaction buffer was 
allowed to equilibrate with increasing concentrations of Tra2 at 30 °C 
for 20 nun. Each binding reaction was then adjusted with either 2 fit 
of a 0.02 U/mL solution of RNase A for 8 min or with 2 /iL of a freshly 
prepared 3.5 U/mL solution of RNase V1 for 10 min at 30 °C. Posi- 
tions of the 13-nt repeat elements and the PRE are indicated on the 
left side. 



ysis (Fig. 8 A), heat annealed prior to the addition of 
nuclear extract (Fig. 8B), or whether the antisense 
oligonucleotide was added shortly after the incubation 
of D2 in nuclear extract (Fig. 8C). Thus, the dsx en- 
hancer presents the repeats in a single-stranded con- 
figuration in the absence (Fig. 8A) and in the presence 
of nuclear extract (Fig. 8Q. Similar results were ob- 
tained in antisense experiments using Dl as the sub- 
strate with the exception of a significantly reduced 
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splicing efficiency for heat-treated Dl in the absence of 
antisense (data not shown). In control experiments, the 
presence of the antisense oligonucleotide at concentra- 
tions that inhibited dsx pre-mRNA splicing did not 
affect the splicing of 0-globin pre-mRNA (data not 
shown). We conclude that the single-stranded charac- 
ter of the 13-nt repeats is essential for splicing enhancer 
activity. Thus, the conservation of both the sequence 
and the structure of the 13-nt repeats are essential for 
dsx splicing enhancer activity. 

DISCUSSION 

On the basis of results presented here, we propose that 
the dsxRE adopts a secondary structure that optimizes 
interactions between individual repeat elements and 
the RNA-binding domains of Tra2 and SR proteins. In 
all cases, except repeat 2, the repeats are present in a 
single-stranded configuration. This structural differ- 
ence correlates with the observation that repeat 2 is not 
protected as efficiently from RNase A digestion at Tra2 
concentrations that maximally protect the other re- 
peats. Thus, the single-stranded character of the re- 
peats may be essential for the formation of a stable 
multiprotein complex consisting of Tra2, Tra, SR pro- 
teins, and possibly other nuclear proteins (Tian & 
Maniatis, 1993). This stable enhancer complex can then 
facilitate the recruitment of general splicing factors to 
the upstream female-specific 3' splice site (Zuo & Mani- 
atis, 1996). 

A comparison of the toREs from D. mclanogaster and 
D. virilis revealed conserved sequences at the 5' and 3' 
ends of the element, and conserved repeat sequences 
that are separated by highly divergent RNA sequences. 
Although attempts to identify a function for the con- 
served 5 r and 3' sequences by in vitro splicing assays 
have thus far failed (K.W. Lynch & T. Maniatis, un- 
publ.), it seems likely that this conservation is impor- 
tant in flies. The conservation of the repeat sequences 
is almost certainly due to a conserved recognition by 
Tra and Tra2, because we have shown that the D. mcl- 
anogaster and D. virilis dsxREs are functionally inter- 
changeable, and both require D. melanogasterTra. and 
Tra2 for their functions. In addition, a recent study 
demonstrated that the D. virilis Tra homologue can par- 
tially rescue the Tra mutant phenotype in transgenic 
D. melanogaster (OTSTeil & Belote, 1992). Although the 
primary sequence and the detailed secondary struc- 
tures of the inter-repeat sequences are poorly con- 
served, our data indicate that the repeats in both 
species are maintained as single-stranded RNA re- 
gions. Thus, the inter-repeat structure may have 
evolved to maintain the repeats in this configuration. 
The importance of maintaining the repeats in a single- 
stranded configuration was demonstrated by showing 
that a 13-nt RNA complementary to the repeats inhib- 
its enhancer-dependent splicing. Thus, the repeat ele- 
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FIGURE 8. Antisense inhibition of dsx pre- 
mRNA splicing. The splicing efficiency of D2 
was followed over 120 min in the presence or ab- 
sence of 4 /xM antisense oligonucleotide comple- 
mentary to the 13-nt repeat element. D2 was (A) 
•preincubated with the antisense oligonucleotide 
under conditions identical to those used in the 
secondary structure determination prioT to the 
addition of nuclear extract, (B) heat annealed at 
splicing conditions to the antisense oligonucle- 
otide prior to the addition of nuclear extract, or 
(Q incubated in nuclear extract for 1-2 min prior 
to the addition of antisense oligonucleotide. 



merits within the dsx enhancer are only recognizable by 
Tra/Tra2 and SR proteins when presented in a single- 
stranded configuration. 

Comparison of the D. melanogaster and D. virilis se- 
quences revealed no conservation of the PRE identified 
in the dsxRE from D. melanogaster (Lynch & Maniatis, 

1995) . However, a distinct purine-rich sequence is 
found in the D. virilis ds*RE, and this sequence is ca- 
pable of functioning as a constitutive splicing enhancer 
(K.W. Lynch &T. Maniatis, unpubL). Surprisingly, our 
RNA structural probing data and the computer- 
assisted folding analysis indicate that the PRE in D. mel- 
anogaster is primarily in a duplex configuration, even 
though Tra, Tra2, and SR proteins crosslink specifically 
to this sequence in nuclear extracts (Lynch & Maniatis, 

1996) . RNA footprinting data (Fig. 7) suggests that Tra2 
binding induces a conformational change in the PRE. 

Although the secondary structure representations of 
the dsxRE, which are shown in Figure 6, are consistent 
with the biochemical data, it is important to note that 
each provides only one of many possible and very sim- 
ilar configurations. There are several architectural sim- 
ilarities in the dsxKE structures obtained. Repeats 3 
and 4 in D. virilis are flanked by extended hairpins and 
separated by a four-base pair hairpin. In D. melanogas- 
ter, the structural format resembles that observed in 
D. virilis, with the exception that an additional 13-nt re- 
peat element is included. Thus, repeats 3 and 5 are 
flanked by extended helical regions and repeats 4 and 5 
are separated by a small but stable hairpin. In both pro- 
posed structures, repeat 2 is involved in the formation 
of the hairpin that flanks repeat 3. 

There is little information to speculate on the func- 
tional importance of the proposed secondary structure 
elements flanking each repeat element. Previously, a 



single repeat element (Hoshijima et al., 1991) or a syn- 
thetic tandem repeat element was shown to substitute 
for the D. melanogaster dsxRE in in vivo transfection ex- 
periments (Inoue et al., 1992). Similar results have been 
obtained in vitro using HeLa cell nuclear extracts (KJ. 
Hertel & T. Maniatis, in prep.). These observations in- 
dicate that the ris-regulatory element requirement is 
met by the presence of only one or two 13-nt repeat 
elements in close proximity to each other. Thus, the 
secondary structure elements in the dsxRE are not 
required for Tra- and Tra2-dependent stimulation of 
splicing as long as the single-stranded character of the 
repeats is maintained. However, the evolutionary con- 
servation of multiple repeats and their single-stranded 
nature argue strongly that both are required for the 
fine-tuned regulation of dsx pre-mRNA splicing in flies. 
For example, five of six or three of four repeat elements 
in D. melanogaster and in D. virilis, respectively, are in 
single-stranded configuration. Thus, the inter-repeat 
structure elements might influence indirectly the effi- 
ciency of splice site activation by maintaining the ac- 
cessibility of the repeat elements. In flies, where the 
levels of Tra and Tra2 are likely to be less than those 
generated in transfection or in vitro experiments, this 
arrangement may be essential for the controlled func- 
tion of the dsxRE. By contrast, the high levels of Tra 
and Tra2 used in in vitro experiments or produced in 
cotransfection experiments are sufficient to observe 
splice site activation with only a single repeat element. 

In addition to the structural analysis of the dsxRE, 
the footprinting and terminal truncation results have 
shown that the 13-nt repeat elements can act as bind- 
ing sites for Tra2. The data are therefore consistent 
with the model of a direct interaction of Tra2 with the 
13-nt repeat element. Other lines of evidence indicate 
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that specific protein-RNA interactions within the re- 
peats are highly dependent on protein-protein inter- 
actions. A recent in vitro binding analysis demonstrated 
cooperative binding of Tra, SR proteins, and Tra2 to 
the intact dsxKE (Lynch & Maniatis, 1995). When as- 
sayed in nuclear extracts, efficient binding of Tra2 to 
the 13-nt repeat is highly dependent on the presence 
of Tra (Lynch & Maniatis, 1996). Given these observa- 
tions, it is very likely that the number of repeat se- 
quences in the dsxKE and their context-dependent 
accessibility to Tra, Tra2, and SR proteins results in 
protein-RNA interactions that lead to the formation of 
an enhancer complex capable of promoting 3' splice site 
recognition at a distance. This property is not shared 
with simple constitutive enhancer elements. 

Tra2 and SR proteins, but not Tra, contain the RRM 
RNA-binding domain found in a large family of RNA- 
binding proteins involved in RNA metabolism. The 
crystal structure of the UlA-snRNA complex showed 
that the RRM makes specific contacts with the single- 
stranded region of an RNA hairpin structure (Oubridge 
et al., 1994). Similarly, the single-stranded nature of the 
repeat elements of the dsxRE is consistent with the pos- 
sibility that they are recognized by the RRM of Tra2 
and possibly SR proteins. 

MATERIALS AND METHODS 
RNA 

Dl RNA was synthesized from plasmid Dl using T7 RNA 
polymerase (Tian & Maniatis, 1992). The splicing substrate, 
D2, in which the enhancer region of D. melanogaster was sub- 
stituted with the enhancer region of D. virilis, was con- 
structed by subcloning the enhancer region (inclusive of 
repeat 1 to just before the 3' conserved region) into a PCR- 
generated Ea>R I site of Dl located just upstream of the first 
repeat sequence (Lynch & Maniatis, 1995). 

The probes R2-5PRE, R2-6, and D6 were generated as de- 
scribed previously (Lynch & Maniatis, 1995). The construct 
encoding Rl-6 was made by cloning a fragment containing 
the T7 transcription start site and repeat 1 into the Mlu 1 site 
of R2-6; RNA was then synthesized by in vitro transcription. 
Vl-4 was transcribed from a construct in which the D. virilis 
enhancer PCR fragment from D2 was cloned into the EcciR I 
site downstream of the T7 promoter in SP72. 

Splicing substrates were labeled uniformly with [^PJUTP. 
5'-End-labeling of the oligonucleotides R2-5PRE, R2-6, Rl-6, 
and Vl-4 synthesized by T7 RNA polymerase was accom- 
plished by removing 5' triphosphates with calf intestine 
phosphatase followed by reaction with [-y-^PjATP and T4 
polynucleotide kinase. Oligonucleotide concentrations were 
determined from specific activities for radiolabeled RNAs, as- 
suming a residue extinction coefficient of 8.5 x 10 3 M' 1 cm" 1 
at 260 nm for nonradioactive RNA. 

In vitro splicing reactions 

In vitro splicing reactions were generally conducted as de- 
scribed in Tian and Maniatis (1992). The RNA antisense ex- 



periments were conducted using identical conditions except 
for the presence of 4 /xM antisense oligonucleotide and ap- 
proximately 50 nM poly I-C. The presence of poly I-C was re- 
quired to reduce the activity of a double-stranded deaminase 
(dsRad) activity present in HeLa cell nuclear extracts (Yang 
et al., 1995). Control experiments with dsx pre-mRNA and 
0-globin pre-mRNA established that the presence of poly I-C 
did not affect splicing efficiency significantly. The splicing 
substrate was incubated under splicing conditions with the 
antisense RNA oligonucleotide for 30 min at 30 °C with or 
without a prior heat anneal step (95 °C for 1.5 min in the ab- 
sence of MgCl 2 , then add MgCl 2 ). The splicing reaction was 
then initiated by the addition of nuclear extract, poly I-C, and 
Tra/Tra2. The final concentrations were 30% (v/v) nuclear ex- 
tract, 4 fM antisense RNA, 50 nM poly I-C, 50 nM Tra, and 
50 nM Tra2 in a volume of 50 /iL. In another experiment, the 
substrate was incubated with nuclear extract for 1-2 min prior 
to the addition of poly 1-C, antisense RNA, and Tra/Tra2. 

Recombinant proteins 

Recombinant Tra and Tra2 were expressed in baculovirus and 
purified as described in Tian and Maniatis (1992). 

Cloning and sequencing of the D. virilis doubfesex 
female-specific exon 

The female-specific exon of D. virilis doublesex gene was iso- 
lated from a D. virilis genomic library constructed in EMBL3, 
which was kindly provided by Stuart Newfeld. The library 
was screened through successive rounds of high-stringency 
hybridization to a DNA probe that contained the sequence 
of the third intron of the D. virilis doublesex gene. The probe 
was isolated by PCR from D, virilis genomic DNA using prim- 
ers based on the published D. virilis doublesex intron sequence 
(Burtis & Baker, 1989). After a positive clone was identified, 
the phage DNA was isolated, digested with various restric- 
tion enzymes, and analyzed by Southern blot to determine 
the minimal fragment that contained the intron sequence. A 
1.8-kDa BstY I fragment, which hybridized strongly to the in- 
tron probe, was then subcloned into SP73 and sequenced 
using the T7 and SP6 priming sites. 

Enzymatic and chemical structure probing 

All reactions were conducted at 30 °C in a buffer containing 
72 mM KC1, 12 mM HEPES, pH 7.9, 3.2 mM MgCI 2 , 1 mM 
ATP, 20 mM creatine phosphate, and 4% glycerol. These 
conditions were chosen to mimic those used in the splicing 
reaction. For the enzymatic probing of the enhancer RNAs, 
the RNases A, Tl, and VI were used. RNase A and Tl 
are single-stranded and nucleotide-specific RNases leaving 
3'-phosphate products. RNase A cleavage is pyrimidine- 
specific with a preference for CpN bonds (Knapp, 1989). 
RNase Tl recognizes CpN bonds. Both enzymes remain ac- 
tive in EDTA. RNase VI was used to determine which por- 
tions of the RNA are found base paired at the conditions 
used. VI requires the presence of Mg 2+ for activity. In a typ- 
ical structure probing experiment, a trace amount of 5' end- 
labeled RNA in 10 fiL reaction buffer was incubated with 
either 0.02 U/mL RNase A, 0.4 U/mL Tl, or 0.07 U/mL VI. 
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Time points (3 /iL) were taken at appropriate time intervals, 
mixed with a formamide buffer containing 10 mM EDTA, 
0.02% bromophenol blue, and 0.02% xylene cyanol and im- 
mediately frozen to -70 °C until all time points were collected. 
Each time point was then subjected to 6% PAGE. For all 
RNA probes tested, the presence of carrier tRNA or the ad- 
dition of a denaturing/renaturing step prior to the digestion 
did not result in an altered susceptibility to the RNases used. 

In addition to the enzymatic probing, chemical base mod- 
ification assayed by reverse transcription was used to exam- 
ine the secondary structure of the D. melanogaster enhancer 
region. The DMS treatment was conducted according to Zaug 
and Cech (1995). The modified RNAs were a nne aled to a 
20-nt [ 7 - 32 P]-end-labeled DNA primer, ATTTTGTCCTTTGT 
CCTTTG, corresponding to sequences between the fifth and 
sixth repeats. The annealed primer/RNA complex was then 
extended with Superscript reverse transcriptase (Gibco BRL). 
Typically, 0.5 /xg of R2-5PRE in 50 pL of reaction buffer (90 nM 
R2-5PRE) was incubated for varying times with 1-3 pL of a 
30% (v/v) DMS/ethanol mix. Each time point was quenched 
with 0.5 x volume of 0.75 M sodium acetate and 0.5 M 
/3-mercaptoethanol. After ethanol precipitation and resuspen- 
sion, approximately 0.1 /*g of the modified RNA in 10 pL 
(80 nM R2-5PRE) was annealed to a fourfold molar excess of 
5'-end-labeled DNA primer. Each of the 10-/xL extension re- 
actions used 2 /xL of the annealed primer mixture and were 
supplemented with 450 mM dNTPs and 100 U of Superscript 
reverse transcriptase. After 1 h at 42 °C, reactions were ter- 
minated by the addition of an equal volume of formamide 
buffer, heated to 95 °C for 2 min, and then subjected to 6% 
PAGE. 

Computer analysis 

The primary sequence data from the enhancer regions of 
D. melanogaster and D. virilis were aligned by computer to de- 
termine primary sequence conservation. They were then an- 
alyzed independently with the MFOLD and PLOTFOLD 
application programs of the Sequence Analysis Software 
Package (version 7.2) from the Genetics Computer Group, 
University of Wisconsin Biotechnology Center. Version 7.2 
makes use of an RNA-folding algorithm developed by Zuker 
(Jaeger et al., 1989) and incorporates updated bond energies. 
The secondary structure data accumulated in experiments de- 
scribed above was used to restrict the folding of nucleotides 
that are predominantly in a single-strand conformation. 
MFOLD generates optimal and suboptimal structures. These 
were then analyzed for agreement with the remaining exper- 
imental data. 
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Abstract 

This review summarizes structural and functional studies on medfly promoters and regulatory elements that can be used for 
driving sex-specific, conditional and constitutive gene expression in this species. Sex-specific and conditional promoters are impor- 
tant for generating transgenic sexing strains that could increase the performance of the Sterile Insect Technique while strong 
constitutive promoters are necessary for developing sensitive transgenic marker systems. The review focuses on the functional 
analysis of the promoters of two male-specific and heat shock medfly genes. A special emphasis is put on the potential utility of 
these promoters for developing transgenic sexing strains. 
© 2003 Elsevier Ltd. All rights reserved. 
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1. Introduction 

Fruit flies in the family Tephritidae are rated among 
the world's most destructive agricultural pests, 
especially in commercial fruit and vegetables. Chemical 
pesticide control is the most commonly used method 
for containing fruit fly populations with known adverse 
effects on the environment and health. During the last 
decades, there has been an increasing interest in bio- 
logical methods for control of insect pests aiming at 
replacing the existing insecticide-based control meth- 
ods. A biological method that has proven to be effec- 
tive in the field for the area-wide control of some 
insects is the sterile insect technique (SIT). SIT is a spe- 
cies-specific and environmentally non-polluting method 
of insect control that relies on the mass rearing, ster- 
ilization, and release of a large numbers of insects (Kni- 
pling, 1955; Krafsur, 1998). If enough sterile insects are 



* Corresponding author. Tel.: +30-610-997-368; fax: +30-610-997- 
881. 

E-mail address: mintzas@upatras.gr (A.C. Mintzas). 

0965-1748/$ - see front matter © 2003 Elsevier Ltd. All rights reserved, 
doi: 1 0. 1 01 6/j.ibmb.2003.06.01 6 



released for a sufficient time, most of the wild females 
in the field mate with the released sterile males and 
thus produce no viable offspring. Highly successful, 
area-wide SIT programs have been operated against 
major agricultural pests such as the New World screw- 
worm, Cochliomyia hominovorrax, the tsetse fly (Glos- 
sina spp.) and the Mediterranean fruit fly (medfly) Cer- 
atitis capitata (reviewed in Robinson, 2002). 

The medfly is a notorious pest with a worldwide 
range and a history of fast expansion and painful inva- 
sions to various countries (Harris, 1989), and so far it 
is the best-studied fruit fly at the genetic and molecular 
level. For medfly, SIT has been shown to be most effec- 
tive when only sterile males are released in the field 
(Hendrichs et al., 1995). Current medfly SIT programs 
use genetic sexing strains (GSS) that are based on the 
use of male linked chromosomal translocations, where 
the translocation carries a dominant wild-type allele for 
a selectable gene. These chromosome aberration-based 
systems tend to be unstable and reduce the fitness of 
the insects, making them less effective agents for SIT 
(Robinson et al., 1999). In addition, analogous strains 
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have to be constructed, de novo, for each target species, 
a laborious task for insects with a limited genetic back- 
ground. 

An alternative method for making GSS is to use gen- 
etic engineering (Alphey and Andreasen, 2002) as has 
been recently demonstrated in Drosophila melanogaster 
(Thomas et ah, 2000; Heinrich and Scott, 2000; Markaki 
et aL, this issue). In the past six years, stable genetic 
transformation systems developed in several pest 
insects, including medfly, provided significant opportu- 
nities to further improve the effectiveness of SIT and to 
develop novel pest control strategies (reviewed in 
Atkinson et al., 2001). Gene transfer technology can 
lead to two major improvements of SIT: (a) develop- 
ment of transgenic sexing systems for the generation of 
novel GSS with better characteristics than the existing 
ones and (b) development of transgenic marker systems 
for detecting, maintaining and recognizing transgenic 
insects. A major advantage of these systems is that they 
are likely to be applied in a wide range of pest insects. 
Sex-specific and conditional promoters and regulatory 
elements are key components for developing transgenic 
sexing systems, while strong constitutive promoters are 
important for developing sensitive transgenic marker 
systems. A number of promoters from sex-specific, con- 
ditional and constitutive medfly genes have been 
cloned. In the present review we summarize published 
data on the structural and functional characterization 
of these promoters and present new data on the func- 
tional analysis of a conditional hsp70 promoter. 



2. Male-specific promoters 

Transgenic sexing strains can be constructed by 
using male-specific promoters to drive the expression of 
selectable genes encoding 'resistance factors' in males. 
These strains could be grown under normal conditions, 
and then switched to restrictive conditions for the last 
generation so that all females die, giving a male only 
population for sterile release programs. 

Five male-specific serum proteins (MSSPs) have been 
characterized in the medfly (Katsoris et al., 1990; Thy- 
mianou et al., 1995). The two major ones are homo- 
dimers of two related polypeptides (MSSP-ot and -p), 
with molecular weights of 14.5 and 13.5 kDa respect- 
ively, while the others are homo- and hetero-dimers of 
a- and P-type polypeptides. By screening an expression 
library with anti-MSSP antibodies, a cDNA coding for 
an a-type polypeptide with structural similarities to the 
odorant binding proteins was isolated (Thymianou 
et aL, 1998). A small multigene family encoding closely 
related MSSP polypeptides was subsequently cloned 
and characterized. This family consists of at least seven 
members, divided according to sequence similarity in 
three subgroups, two closely related MSSP-ct (a/, a2) 



and MSSP-fi (p/, p2, pi), and one more divergent, 
MSSP-y (yl, y2) (Christophides et al., 2000a). Phylo- 
genetic analysis of the MSSP gene family showed that 
it has originated by gene duplications of an ancestral 
gene. The very high degree of identity, both in their 
coding and surrounding regions, predicts that MSSP 
genes have arisen by very recent gene duplications. 
Although MSSPs are mainly expressed in the male fat 
body, analytical expression studies by RNA blot hybri- 
dization and RT-PCR suggested that individual mem- 
bers of this family are expressed in a distinct sex- and 
tissue-specific manner (Christophides et al., 2000a). 

2.7. Functional analysis of MSSP-u2 and MSSP-&2 
promoters 

The MSSP-ol2 and MSSP-02 genes have identical 5' 
untranslated regions (5' UTR) and exhibit 94.5% ident- 
ity along their 504 bp upstream promoter regions, pre- 
senting a few nucleotide substitutions and single or 
small nucleotide deletions (Christophides et al., 2000b). 
In both genes, a putative transcription initiation site is 
located 37 bp upstream of the ATG initiation codon 
and 31»bp downstream of a typical TATA box. Func- 
tional analysis of the promoters of these genes was per- 
formed in transgenic medflies using the Minos 
transformation system (Christophides et al., 2000b). 
For the construction of the Mzwcu-based transposon 
plasmids presented in Fig. 1A, two overlapping pro- 
moter fragments of each gene containing the 5' UTRs 
and additional 5' flanking regions were fused to the 
recombinant AUGP-gal (lacZ) reporter gene (Mismer 
and Rubin, 1987; Thummel et al., 1988). As shown in 
Fig. 1A, the two overlapping promoter fragments 
(ot2PS and o2PL) correspond to -283/+37 and -522/ 
+37 sequences of the MSSP-a2 gene and the anal- 
ogous fragments (P2PS and P2PL) correspond to 
-287/+37 and -485/+ 37 sequences of the MSSP-02 
gene. 

The results from the transformation experiments 
with the four MSSP-lacZ constructs are shown in 
Table 1 . Twenty-nine transgenic lines were established 
from a total of 1557 GO adults. In all o2PS and <x2PL 
lines, lacZ expression was exclusively detected in the 
fat body of adult males. In ot2PS lines, X-gal staining 
was detected approximately 72 h after eclosion whereas 
in ot2PL lines within the first 30 h after eclosion. Rela- 
tive to the endogenous MSSP expression, which starts 
24 h after eclosion (Thymianou et al., 1995), the 
expression of the transgene was delayed by 50 h in 
<x2PS lines but correct in <x2PL lines. Quantitative mea- 
surements of the P-galactosidase activity and western 
analysis during adult development, in a a2PL line, 
showed that the expression pattern of the transgenic 
protein was very similar to that of the endogenous 
protein. Quantification of P-galactosidase levels in 
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Fig. 1. Schematic illustration of the Minos constructs used for promoter analysis of the MSSP-<x2 and MSSP-&2 genes. (A) Four overlapping 
fragments from the 5' region of the genes (ot2PS, a2PL, p2PS and p2PL) were fused to a lacZ reporter gene and then cloned into the unique Not I 
restriction site of a modified Minos transformation vector, pTZMiCcwNotl, which is marked with the medfly white gene (Loukeris et ah, 1995). 
hsp70P, hsp70T and SV40T represent Drosophila and SV40 promoter and terminator sequences (Christophides et al., 2000b). (B) The -283/+27 5' 
region of the MSSP-ct2 gene (<x2PS) was fused to the Adh gene encoding the FAST isoform of Drosophila ADH and then cloned into the Minos 
transformation vector, pTZMiCcwNotl as described above. The sequence of the Adh gene included 36 bp from the 5'UTR, the entire 3' UTR and 
120 bp from the 3' flanking region is indicated by white boxes. 



synchronized transgenic males showed that lacZ exp- 
ression in most a2PL lines was two to twenty-fold 
higher than that of the strongest a2PS line. Analysis of 
the P2PS and P2PL lines revealed that the reporter 
gene was not expressed in the male fat body but in the 
midgut of both sexes. In both lines, lacZ expression 
started at the pupal stage, a few hours before eclosion, 
reached maximum levels at the second day, and then 
declined. Similarly to MSSP-at2 promoter, the long 
MSSP-f$2 promoter fragment (P2PL) gave higher P- 
galactosidase levels than the respective short fragment 
(P2PS). 

In conclusion, these data indicate that the — 283/+37 
promoter region of the MSSP-<x2 gene is able to drive 
basal sex-specific gene expression in the fat body of 

Table 1 

Transformation experiments with MSSP promoter constructs 



Constructs 


GO adults 


Transgenic lines 


Transformation 








frequency (%) a 


a2PS-/acZ 


404 


4 




ct2PL-/<zcZ 


368 


13 


3.5 


VIVS-lacZ 


486 


9 


2.5 


P2?L-!acZ 


299 


3 


0.7 


a2?S-Adh 


514 


13 


2.5 . 



a Transformation frequency was calculated as percentage of trans- 
genic lines per GO adults. 



adult males and that additional sequences in the —522/ 
—284 5' flanking region of the gene are responsible for 
transcriptional enhancement and correct temporal 
expression. On the other hand, the -287/+37 pro- 
moter region of the MSSP-($2 gene drives basal 
expression in the midgut of both sexes. The —485/ 
-288 5' flanking region of this gene does not affect the 
sex and tissue specificity but it may confer transcrip- 
tional enhancement similar to the —522/— 284 region of 
the MSSP-<x2 gene. Since the MSSP-a2 and MSSP-02 
genes have identical 5' UTRs, the regulatory elements 
responsible for the differences in the tissue and sex 
specificity of their promoters must be located in their 
-283/- 1 and -287/- 1 regions, respectively. These 
regions have 12 nucleotide variations, two single dele- 
tions and one deletion of 5 nucleotides, all dispersed 
along their sequence. It seems, therefore, that within 
the as-acting elements of these regions, a few nucleo- 
tides with strong binding affinities for transcription fac- 
tors may be responsible for the differential function of 
the MSSP-(x2 and MSSP-fl2 promoters. 

The function of MSSP-oc2 and MSSP-fc promoters 
has also been studied in a heterologous system by tra- 
nsforming D. melanogaster with ot2PL- and (32PL-/acZ 
constructs. Interestingly, both promoters drove trans- 
gene expression in the midgut of both Drosophila sexes 
giving identical lacZ expression patterns to those 
obtained in 02PS and (52PL medfly lines. Several 
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studies have shown that sex-specific expression of genes 
is poorly conserved between species (reviewed in Schutt 
and Nothiger, 2000). For example, trypsin genes of 
Anopheles gambiae that are expressed only in adult 
female mosquitoes are expressed in both sexes of 
Drosophila transformants (Muller et al., 1995; Skavdis 
et al., 1996). The same phenomenon has been also 
observed for the apyrase gene of Anopheles gambiae 
which is expressed in the adult salivary glands of 
female mosquitoes. In Drosophila transformants, the 
apyrase promoter, although maintaining its tissue 
and temporal pattern, was expressed in both sexes 
(Lombardo et al., 2000). Since it is known that the gen- 
etic basis of sex determination varies widely among 
insects, these results may reflect the fundamental differ- 
ences of the regulatory networks that affect sex-specific 
gene expression among insects. 

2.2. A potential sexing system based on male-specific 
promoters 

The idea of using the gene of alcohol dehydrogenase 
(ADH) for medfly genetic sexing was proposed many 
years ago (Robinson et al, 1986). In Z>. melanogaster, 
an ADH-based genetic sexing strain was constructed 
by combining a translocation of an Adh + allele to the 
Y chromosome with an AdH line (Robinson and Van 
Heemert, 1981). As a first step towards developing an 
ADH-based genetic sexing system in the medfly, trans- 
genic lines carrying the Drosophila Adh-F gene under 
the direction of the basal oc2PS promoter were con- 
structed (Fig. IB) and tested for alcohol tolerance 
(Christophides et al., 2001). The results from this trans- 
formation experiment are shown in Table 1. Compari- 
son of all the results shown in this table shows that 
transformation frequencies vary between experiments, 
most likely due to experimental manipulations. On 
average, the Minos element yields similar transform- 
ation frequencies to the PiggyBac element in medfly 
(Handler et al., 1998). The established aZPS-Adh-F tra- 
nsgenic lines were designated as MAD (Christophides 
et al., 2001). Western analysis showed significant 
amounts of Drosophila ADH in adult males of several 
of these lines. Northern analysis confirmed the male- 
specific expression of the Adh transgene. ADH activity 
assays showed that both transgenic and endogenous 
ADHs catalyzed the oxidation of ethanol and 2-propa- 
nol. Toxicity tests performed with two MAD lines 
showed that the difference in the ADH levels between 
males and females was not enough for achieving gen- 
etic sexing, although a slightly increased tolerance was 
observed in males. However an efficient sexing strain 
could be made, by using an Adh' medfly strain as host 
for transformation and/or the a2PL promoter for driv- 
ing Adh expression. As described above, the activity of 
this promoter is much higher than the basal o2PS pro- 



moter used for constructing the MAD lines. Addition- 
ally, elimination or modification of the 3' UTR 
negative transcriptional regulatory module 
(AAGGCTGA) of the Drosophila Adh gene (Parsch 
et al., 1999, 2000) may further increase the ADH levels 
and the effectiveness of the sexing strain. Using more 
than one independent insertion would further increase 
the robustness of the strain, though probably at the 
cost of some loss of fitness for each extra insertion. 



3. Female-specific promoters 

Sexing systems, based on engineered conditional 
female-specific lethal genes, have recently been 
developed and demonstrated to work efficiently in D. 
melanogaster (Thomas et al., 2000; Heinrich and Scott, 
2000; Markaki et al., this issue). In these systems, the 
regulatory elements of the Drosophila major yolk pro- 
tein (yp) genes were used to drive, directly or indirectly, 
female-specific expression of the conditional lethal 
genes. Well characterized promoters and upstream 
regulatory elements of female-specific genes from med- 
fly may be required for developing similar sexing sys- 
tems in this species. Although a number of medfly 
female-specific genes have been cloned and character- 
ized, detailed functional analysis of their promoters 
and regulatory elements has not been conducted. These 
genes encode for yolk proteins, chorion proteins and 
antibacterial peptides (ceratotoxins). Data on the struc- 
tural and functional analysis of these genes are sum- 
marized below. 

3. 1. Yolk protein genes 

The two major yolk proteins (Vitellogenins) of the 
medfly (Vg-1 and Vg-2) are synthesized exclusively in 
the fat body and the ovaries of the adult females (Rina 
and Mintzas, 1988). Four vitellogenin genes have been 
cloned and the sequences of two of them (ygl-y and 
vg2-S) have been determined (Rina and Savakis, 1991). 
The 5' flanking regions of these genes show no signifi- 
cant homology to the respective regions of the Droso- 
phila yp genes although several short nucleotide 
sequences have been conserved between the two spe- 
cies. A number of regulatory elements that are respon- 
sible for the sex- and tissue-specific expression of the 
Drosophila yp genes have been well characterized (Gar- 
abedian et al., 1986; Logan et al., 1989; Ronaldson and 
Bownes, 1995). Similar functional studies are necessary 
for characterizing such elements in the medfly vg genes. 

3.2. Chorion genes 

The chorion genes are expressed in the ovaries of the 
adult females during the last phase of oogenesis. The 
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main regulatory elements that are responsible for the 
sex- and tissue-specific expression of these genes have 
been characterized in Drosophila (Swimmer et al., 1990, 
1992; Mariani et al., 1996). Six major chorion genes 
have been isolated from medfly (Konsolaki et al., 1990; 
Tolias et al, 1990; Vlachou et al., 1997). Sequence 
comparisons of four medfly chorion genes with the 
respective genes of four distantly related Drosophila 
species revealed the presence of well conserved sequen- 
ces in their 5' flanking regions that correspond to 
tissue, temporal and amplification control elements of 
D. melanogaster genes. (Vlachou and Komitopoulou, 
2001). Functional studies on the promoter of the med- 
fly s36 gene in Drosophila transformants showed that it 
operates in a similar manner to the Drosophila homo- 
log (Tolias et al., 1993). 

3.3. Ceratotoxin genes 

Ceratotoxins are closely related antibacterial pep- 
tides produced in the female reproductive accessory 
glands of the medfly (Marchini et al., 1997). Cer- 
atotoxin genes are X-linked and organized in a 26 kb 
cluster (Rosetto et al., 1997; Rosetto et al., 2000). Cer- 
atotoxin transcripts are detected only in adult females 
and show maximum levels 6-7 days after eclosion. The 
presence of highly conserved motifs in the upstream 
regions of these genes suggests the presence of common 
regulatory elements. Functional studies are needed to 
investigate whether these conserved motifs contain 
important control elements. 



4. Conditional and constitutive promoters 

4. L The hsplO promoter 

Conditional promoters, such as the heat-inducible 
hsp70 promoter, could be used for developing sexing 
systems by driving tansgenes whose expression would 
lead either to female lethality or to female sex conver- 
sion (Pane et al., 2002). The D. melanogaster hsplO 
promoter has been a popular choice for driving con- 
ditional expression of genes in other insect species. A 
great number of studies have demonstrated the ability 
of this promoter to function in heterologous systems 
(Bienz and Pelham, 1982; Voellmy and Rungger, 1982; 
Burke and Ish-Horowicz, 1982; Mirault et al., 1982; Lis 
et al., 1982; McMahon et al., 1984; Berger et al., 1985; 
Atkinson and O'Brochta, 1992). However the activity 
of this promoter in non-drosophilid insects was found 
relatively low comparatively to that in Drosophila (Berger 
et al., 1985; Atkinson and O'Brochta, 1992). These 
data suggest that homologous hsp70 promoters should 
be employed if high gene expression in other insects is 
required. 



Six medfly hsplO genes, organized in two 30 kb coat- 
ings, have been isolated (Papadimitriou et al., 1998). 
All medfly hsplO genes are mapped to the same poly- 
tene chromosome band (3L:24C), corresponding to one 
of the major heat shock puffs. One of these genes 
(CchsplO-Bl) encodes a 70 kDa protein with 84% 
amino acid sequence identity to the heat shock 70 pro- 
teins of D. melanogaster. Similar to the D. melanogaster 
hsplO genes, the medfly homolog has a long A-rich 5' 
UTR and an AT-rich 3' UTR. These sequences are 
important for efficient translation under heat shock 
conditions and for the degradation of the heat shock 
mRNAs under normal conditions (reviewed in Lindquist 
and Petersen, 1990). The CchsplQ-Bl gene has two 
characteristic heat shock elements (HSEs), proximal to 
the TATA box, that match the heat shock consensus 
sequence very well, CTnGAAnnTTCnAG (Pelham, 
1982) and include three contiguous nGAAn units 
arranged in alternating orientation characterizing a 
functional HSE (Amin et al., 1988). These HSEs are 
located in the region —85/— 49, relatively to the putative 
transcription start site, similarly to the two proximal 
HSEs of the Drosophila 87C1 hsplO gene (Ingolia et al., 
1980) which have been shown to be sufficient for opti- 
mal expression (Dudler and Travers, 1984; Simon et al., 
1985). 

4.2. Functional analysis of the hsp70 promoter 

Functional analysis of the Cc/w/?70-Bl promoter was 
carried out in vivo, in transfected medfly embryos, 
using the chloramphenicol acetyl transferase-encoding 
gene (cat) as a reporter gene. Six hsplO-cat constructs, 
shown in Fig. 2, were made by subcloning PCR ampli- 
fied overlapping fragments from the 5' region of the 
Cc/tf/?70-Bl gene into the pC4cat vector (Thummel 
et al., 1988). The constructs CI, C2, C3, C4 and C5 
contained 391, 263, 106, 71 and 49 bp upstream 
sequences of the CchsplQ-Bl gene, respectively, and the 
entire 5' UTR (196 bp). The construct C6 had the same 
5' flanking region with CI, but contained only the first 
105 bp of the 5' UTR. One Droshophila hsplO-cat con- 
struct (D) was also used for comparison. This construct 
contained the 456 bp promoter region of the D. mela- 
nogaster 87C1 hsplO gene (Ingolia et al., 1980) encom- 
passing the three proximal HSEs and the 5' UTR. 
Plasmid DNA from the hsplO-cat constructs was injec- 
ted into preblastoderm medfly embryos and CAT 
activity was subsequently measured in embryonic 
extracts as described by Atkinson and O'Brochta 
(1992). CAT activity could be detected in 6-h-old 
embryos, reached maximum levels in 24-h-old embryos 
and remained relatively constant till the end of embry- 
ogenesis (Fig. 3). In 1 -day-old larvae no detectable 
activity was observed. Fig. 2 summarizes the results 
obtained from five independent experiments for each 
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Fig. 2. Functional analysis of the medfly hsplO promoter in transfected medfly embryos. hsp70-cat constructs were made by cloning overlapping 
fragments from the 5' region of the medfly hsplO gene (C1-C6) and the -250/+206 fragment from the 5' region of the D. melanogaster 87C1 
hsplO gene (Ingolia et al., 1980) (D) into the pC4cat vector containing the cat gene (Thummel et al., 1988). Boxes indicate the positions of the 
TATA sequences (T) and the HSEs (1, 2 and 3). Numbers show nucleotide positions relatively to potential transcription start sites. Plasmid DNA 
(250 Hg/ml) from the various constructs was injected into the posterior pole of dechorionated preblastoderm medfly embryos (1-2-h-old) covered 
with halocarbon oil. Injected embryos were incubated for 22 h at 25 °C in humidified dishes. In heat shock experiments, 20-h-postinjected 
embryos were incubated for 1 h at 39 °C after which time they were returned to 25 °C for 1 h to recover. In each experiment, groups of 30 well 
shaped embryos were used to determine CAT activity and plasmid DNA recovery according to Atkinson and O'Brochta (1972). Recovered plas- 
mid DNA was estimated by Southern analysis and this value was used to normalize the CAT data. All activity data shown are expressed as a per- 
centage of the average heat shock activity of C2 construct that was set to 100 and represent mean values ±SE from five independent experiments. 



hsplQ-cat construct under both normal and heat shock 
conditions as described in the figure caption. The 
results indicate that the 263 bp upstream region of the 
medfly hsplO gene is sufficient for optimal promoter 
function at both normal and heat shock conditions. A 
5' deletion that leaves the two HSEs intact, reduces 
expression levels about four-fold, while a 5' deletion 
that leaves only the proximal HSE intact reduces 
expression levels approximately forty-fold. These data 
suggest that both HSEs are necessary for the function 
of the medfly hsplO promoter and that additional 
upstream sequences enhance promoter activity. Similar 
results have been reported for the Drosophila hsplO 
promoter. Functional studies on the Drosophila pro- 
moter, using germline transformation (Dudler and Tra- 
vers, 1984) or transfection into cultured cells (Amin 
et al., 1985), have shown that a region encompassing 
the two proximal HSEs is sufficient for optimal 
expression while removal of the 2nd HSE results in a 
fifty to one hundred times decrease of the heat- 
inducible expression. The construct C6 that contains a 
truncated 5' UTR yielded similar CAT activity to CI 
and C2 suggesting that the -263/+ 105 region of the 
CchsplO-Bl gene is sufficient for optimal HSP70 
expression at both normal and heat shock conditions. 
For all constructs, the CAT activity observed in heat 




Fig. 3. Developmental expression of CAT in hsp70-cat injected 
medfly emryos. Plasmid DNA from construct C2 (500 ug/ml) was 
injected into preblasoderm medfly embryos as described in Fig. 2. 
Thirty embryos or larvae were collected at different times after injec- 
tions and subjected to CAT assay as described by Atkinson and 
O'Brochta (1972). Numbers indicate the age of collected embryos (E) 
and larvae (L) in hours. 
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shocked embryos was not significantly higher than that 
of the controls. This is mainly due to the low inducible 
HSP70 expression in 24-h-old medfly embryos which is 
approximately forty times lower than that of larvae 
and adults (unpublished results). The embryonic res- 
triction of HSP70 expression is a widespread phenom- 
enon. Organisms as diverse as fruit flies, sea urchins 
(Roccheri et al., 1982), frogs (Heikkila et al., 1985) and 
mice (Morange et al., 1984) restrict HSP70 inducibility 
in early embryos. Furthermore, the small difference in 
CAT activity between heat shocked and control 
embryos could also be due to an overestimation of the 
constitutive activity in control embryos because of 
experimental manipulations. As reported for Droso- 
phila (Eberlein and Mitchell, 1987; Atkinson and 
O'Brochta, 1992), the activity observed in control 
embryos reflects not only the constitutive activity of the 
hsplO promoter, but also the stressed state of the 
embryos due to dechorionation, desiccation, injection 
and growth under halocarbon oil. The activity of the 
Drosophila hsplO promoter in medfly embryos was 
found to be approximately 30% lower than that of the 
homologous promoter. Germline transformation 
experiments in medfly showed that the inducible 
activity of the homologous promoter is at least five 
times higher than that of the Drosophila hsplO pro- 
moter (unpublished results). These data are in agree- 
ment to those reported by Atkinson and O'Brochta 
(1992) for Lucilia cuprina, and strongly suggest that the 
hsplO and probably other commonly used D. melano- 
gaster promoters may not function efficiently in non- 
drosophilid insects. 

4.3. Constitutive promoters 

Strong constitutive promoters are important for dev- 
eloping robust transgenic marker systems for detecting, 
maintaining and recognizing transgenic insects. The 
Drosophila actin 5C gene is actively transcribed in most 
tissues throughout development and its promoter is 
widely used for driving constitutive gene expression 
(Thummel et al., 1988). Comparative studies in trans- 
fected Drosophila embryos have shown that the actin 
5C promoter is approximately 10 times stronger than 
the hsp70 promoter. However, as with the hsp70 pro- 
moter, the strength of the actin 5C promoter was sig- 
nificantly lower in a non-drosophilid insect (Atkinson 
and O'Brochta, 1992). An actin gene (CcAl) has been 
isolated from medfly but it appears to be muscle-spe- 
cific and its expression is restricted in late pupal and 
adult stages (He and Haymer, 1992). 

Promoters of constitutive hsp genes are also good 
candidates for driving expression of marker genes. One 
of the heat shock Drosophila cognate genes (Hsc4) is 
expressed abundantly in all developmental stages 
(Craig et al., 1983). A homolog of this gene has been 



isolated from medfly (Thanaphum and Haymer, 1998). 
Functional studies are required to determine the 
strength and specificity of this promoter. The medfly 
homolog of the Drosophila hsp83 gene has also been 
isolated and shown to be abundantly expressed thought 
medfly development (unpublished results). Functional 
analysis of its promoter is currently in progress. 

5. Conclusions 

During the past six years, combined efforts in several 
laboratories have led to great progress in the field of 
gene transfer technology in pest insects. Novel trans- 
genic sexing systems have been developed and proved 
to work effectively in D. melanogaster. However, sev- 
eral components of these systems, such as promoters, 
regulatory elements and effector genes, may not work 
efficiently in non-drosophilid insects. Indeed, as poin- 
ted out in this review, a number of promoters show dif- 
ferent sex and tissue specificity as well as strength 
between Drosophila and other insect species suggesting 
that homologous components should be considered for 
developing efficient transgenic sexing and marker sys- 
tems in pest insects. 
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